Hindi and Tamil Question Answer / RAG
In this notebook, we use new Navarasa LLMs from TeluguLLM to create a Hindi and Tamil Question Answering system. Since we're using a 7B model with PEFT, this notebook is run on Google Colab with an A100. If you're working with a smaller machine, I'd encourage to try the 2B model instead.
Time: 25 min | Level: Beginner | |
---|---|---|
Author | Nirant Kasliwal |
!pip install -U fastembed datasets qdrant-client peft transformers accelerate bitsandbytes -qq
import numpy as np
from datasets import load_dataset
from peft import AutoPeftModelForCausalLM
from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct, VectorParams, Distance
from transformers import AutoTokenizer
from fastembed import TextEmbedding
hf_token = "<your_hf_token_here>" # Get your token from https://huggingface.co/settings/token, needed for Gemma weights
Setting Up
We'll download the dataset, our LLM model weights and embedding model weights next
embedding_model = "sentence-transformers/paraphrase-multilingual-mpnet-base-v2"
model_id = "Telugu-LLM-Labs/Indic-gemma-2b-finetuned-sft-Navarasa"
ds = load_dataset("nirantk/chaii-hindi-and-tamil-question-answering", split="train")
ds
This dataset has questions and contexts which have corresponding answers. The answers must be found by the LLM. This is an extractive Question Answering problem.
In order to do this, we'll setup an embedding model from FastEmbed. And then add it to Qdrant in memory mode, which is powered by Numpy.
embedding_model = TextEmbedding(model_name=embedding_model)
We'll use the 7B model here, the 2B model isn't great and was suffering from reading comprehension challenges.
Downloading the Navarasa LLM
We'll download the Navarasa LLM from TeluguLLM-Labs. This is a 7B model with PEFT.
model = AutoPeftModelForCausalLM.from_pretrained(
model_id,
load_in_4bit=False,
token=hf_token,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
Embed the Context into Vectors
questions, contexts = list(ds["question"]), list(ds["context"])
context_embeddings: list[np.ndarray] = list(
embedding_model.embed(contexts)
) # Note the list() call - this is a generator
len(context_embeddings[0])
def embed_text(text: str) -> np.array:
return list(embedding_model.embed(text))[0]
context_points = [
PointStruct(id=idx, vector=emb, payload={"text": text})
for idx, (emb, text) in enumerate(zip(context_embeddings, contexts))
]
len(context_points[0].vector)
Insert into Qdrant
search_client = QdrantClient(":memory:")
search_client.create_collection(
collection_name="hindi_tamil_contexts",
vectors_config=VectorParams(size=len(context_points[0].vector), distance=Distance.COSINE),
)
search_client.upsert(collection_name="hindi_tamil_contexts", points=context_points)
Selecting a Question
I've randomly selected a question here, with a specific and we then find the answer to it. We have the correct answer for it too -- so we can compare the two when you run the code.
idx = 997
question = questions[idx]
print(question)
search_context = search_client.search(
query_vector=embed_text(question), collection_name="hindi_tamil_contexts", limit=2
)
search_context_text = search_context[0].payload["text"]
len(search_context_text)
Running the Model with a Question & Context
input_prompt = """
Answer the following question based on the context given after it in the same language as the question:
### Question:
{}
### Context:
{}
### Answer:
{}"""
input_text = input_prompt.format(
questions[idx], # question
search_context_text[:2000], # context
"", # output - leave this blank for generation!
)
inputs = tokenizer([input_text], return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50, use_cache=True)
response = tokenizer.batch_decode(outputs)[0]
response.split(sep="### Answer:")[-1].strip("<eos>").strip()
ds[idx]["answer_text"]