Skip to content

Usage With Qdrant

This notebook demonstrates how to use FastEmbed and Qdrant to perform vector search and retrieval. Qdrant is an open-source vector similarity search engine that is used to store, organize, and query collections of high-dimensional vectors.

We will use the Qdrant to add a collection of documents to the engine and then query the collection to retrieve the most relevant documents.

It consists of the following sections:

  1. Setup: Installing necessary packages, including the Qdrant Client and FastEmbed.
  2. Importing Libraries: Importing FastEmbed and other libraries
  3. Data Preparation: Example data and embedding generation
  4. Querying: Defining a function to search documents based on a query
  5. Running Queries: Running example queries

Setup

First, we need to install the dependencies. fastembed to create embeddings and perform retrieval, and qdrant-client to interact with the Qdrant database.

!pip install 'qdrant-client[fastembed]' --quiet --upgrade

Importing the necessary libraries:

from qdrant_client import QdrantClient

Data Preparation

We initialize the embedding model and generate embeddings for the documents.

💡 Tip: Prefer using query_embed for queries and passage_embed for documents.

# Example list of documents
documents: list[str] = [
    "Maharana Pratap was a Rajput warrior king from Mewar",
    "He fought against the Mughal Empire led by Akbar",
    "The Battle of Haldighati in 1576 was his most famous battle",
    "He refused to submit to Akbar and continued guerrilla warfare",
    "His capital was Chittorgarh, which he lost to the Mughals",
    "He died in 1597 at the age of 57",
    "Maharana Pratap is considered a symbol of Rajput resistance against foreign rule",
    "His legacy is celebrated in Rajasthan through festivals and monuments",
    "He had 11 wives and 17 sons, including Amar Singh I who succeeded him as ruler of Mewar",
    "His life has been depicted in various films, TV shows, and books",
]

This tutorial demonstrates how to utilize the QdrantClient to add documents to a collection and query the collection for relevant documents.

➕ Adding Documents

The add creates a collection if it does not already exist. Now, we can add the documents to the collection:

client = QdrantClient(":memory:")
client.add(collection_name="test_collection", documents=documents)
100%|██████████| 77.7M/77.7M [00:05<00:00, 14.6MiB/s]

['4fa8b10c78da4b18ba0830ba8a57367a',
 '2eae04b515ee4e9185a9a0e6be812bba',
 'c6039f88486f47f1835ae3b069c5823c',
 'c2c8c51e305144d1917b373125fb4d95',
 '79fd23b9ec0648cdab38d1947c6b933e',
 '036aa200d8c3492b8a438e4f825f5e7f',
 'c35c77f3ea37460a9a13723fb77b7367',
 '6ebccbca571b40d0ab6e83e5e0f2f562',
 '38048c2ccc1d4962a4f8f1bd89c8357a',
 'c6b09308360140c7b4f106af3658a31e']

These are the ids of the documents we just added. We don't have a use for them in this tutorial, but they can be used to update or delete documents.

📝 Running Queries

We'll define a function to print the top k documents based on a query, and prepare a sample query.

# Prepare your documents, metadata, and IDs
docs = ["Qdrant has Langchain integrations", "Qdrant also has Llama Index integrations"]
metadata = [
    {"source": "Langchain-docs"},
    {"source": "Linkedin-docs"},
]
ids = [42, 2]

# Use the new add method
client.add(collection_name="demo_collection", documents=docs, metadata=metadata, ids=ids)
[42, 2]

Behind the scenes, Qdrant Client uses the FastEmbed library to make a passage embedding and then uses the Qdrant API to upsert the documents with metadata, put together as a Points into the collection.

search_result = client.query(
    collection_name="demo_collection", query_text="This is a query document"
)
print(search_result)
[QueryResponse(id=42, embedding=None, metadata={'document': 'Qdrant has Langchain integrations', 'source': 'Langchain-docs'}, document='Qdrant has Langchain integrations', score=0.8276550115796268), QueryResponse(id=2, embedding=None, metadata={'document': 'Qdrant also has Llama Index integrations', 'source': 'Linkedin-docs'}, document='Qdrant also has Llama Index integrations', score=0.8265536935180283)]

🎬 Conclusion

This tutorial demonstrates the basics of working with the QdrantClient to add and query documents. By following this guide, you can easily integrate Qdrant into your projects for vector similarity search and retrieval.

Remember to properly handle the closing of the client connection and further customization of the query parameters according to your specific needs.

The official Qdrant Python client documentation can be found here for more details on customization and advanced features.