Hybrid Search with FastEmbed & Qdrant

What will we do?

This notebook demonstrates the usage of Hybrid Search with FastEmbed & Qdrant.

Setup: Download and install the required dependencies
Preview data: Load and preview the data
Create Sparse Embeddings: Create SPLADE++ embeddings for the data
Create Dense Embeddings: Create BGE-Large-en-v1.5 embeddings for the data
Indexing: Index the embeddings using Qdrant
Search: Perform Hybrid Search using FastEmbed & Qdrant
Ranking: Rank the search results with Reciprocal Rank Fusion (RRF)

Setup

In order to get started, you need a few dependencies, and we'll install them next:

!pip install -qU qdrant-client fastembed datasets transformers

import json

import numpy as np
import pandas as pd
from datasets import load_dataset
from qdrant_client import QdrantClient
from qdrant_client.models import (
    Distance,
    NamedSparseVector,
    NamedVector,
    SparseVector,
    PointStruct,
    SearchRequest,
    SparseIndexParams,
    SparseVectorParams,
    VectorParams,
    ScoredPoint,
)
from transformers import AutoTokenizer

import fastembed
from fastembed import SparseEmbedding, SparseTextEmbedding, TextEmbedding

fastembed.__version__  # 0.2.5

'0.2.5'

dataset = load_dataset("tasksource/esci", split="train")
# We'll select the first 1000 examples for this demo
dataset = dataset.select(range(1000))
dataset = dataset.filter(lambda x: x["product_locale"] == "us")
dataset

Dataset({
    features: ['example_id', 'query', 'query_id', 'product_id', 'product_locale', 'esci_label', 'small_version', 'large_version', 'product_title', 'product_description', 'product_bullet_point', 'product_brand', 'product_color', 'product_text'],
    num_rows: 919
})

Preview Data

source_df = dataset.to_pandas()
df = source_df.drop_duplicates(
    subset=["product_text", "product_title", "product_bullet_point", "product_brand"]
)
df = df.dropna(subset=["product_text", "product_title", "product_bullet_point", "product_brand"])
df.head()

	example_id	query	product_id	product_locale	esci_label	large_version	product_title	product_description	product_bullet_point	product_brand	product_color	product_text
0	0	revent 80 cfm	B000MOO21W	us	Irrelevant	1	Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...	None	WhisperCeiling fans feature a totally enclosed...	Panasonic	White	Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...
2	1	revent 80 cfm	B07X3Y6B1V	us	Exact	1	Homewerks 7141-80 Bathroom Fan Integrated LED ...	None	OUTSTANDING PERFORMANCE: This Homewerk's bath ...	Homewerks	80 CFM	Homewerks 7141-80 Bathroom Fan Integrated LED ...
3	2	revent 80 cfm	B07WDM7MQQ	us	Exact	1	Homewerks 7140-80 Bathroom Fan Ceiling Mount E...	None	OUTSTANDING PERFORMANCE: This Homewerk's bath ...	Homewerks	White	Homewerks 7140-80 Bathroom Fan Ceiling Mount E...
4	3	revent 80 cfm	B07RH6Z8KW	us	Exact	1	Delta Electronics RAD80L BreezRadiance 80 CFM ...	This pre-owned or refurbished product has been...	Quiet operation at 1.5 sones\nBuilt-in thermos...	DELTA ELECTRONICS (AMERICAS) LTD.	White	Delta Electronics RAD80L BreezRadiance 80 CFM ...
5	4	revent 80 cfm	B07QJ7WYFQ	us	Exact	1	Panasonic FV-08VRE2 Ventilation Fan with Reces...	None	The design solution for Fan/light combinations...	Panasonic	White	Panasonic FV-08VRE2 Ventilation Fan with Reces...

print(f"Catalog Item Count: {len(df)}\nQueries: {len(source_df)}")

Catalog Item Count: 176
Queries: 919

df["combined_text"] = (
    df["product_title"] + "\n" + df["product_text"] + "\n" + df["product_bullet_point"]
)

len(df)

Create Sparse Embeddings

sparse_model_name = "prithvida/Splade_PP_en_v1"
dense_model_name = "BAAI/bge-large-en-v1.5"
# This triggers the model download
sparse_model = SparseTextEmbedding(model_name=sparse_model_name, batch_size=32)
dense_model = TextEmbedding(model_name=dense_model_name, batch_size=32)

Fetching 9 files:   0%|          | 0/9 [00:00<?, ?it/s]

Fetching 7 files:   0%|          | 0/7 [00:00<?, ?it/s]

def make_sparse_embedding(texts: list[str]) -&gt; list[SparseEmbedding]:
    return list(sparse_model.embed(texts, batch_size=32))


sparse_embedding: list[SparseEmbedding] = make_sparse_embedding(
    ["Fastembed is a great library for text embeddings!"]
)
sparse_embedding

[SparseEmbedding(values=array([0.17295611, 0.80484658, 0.41356239, 0.38512513, 0.90825951,
        0.61373132, 0.18313883, 0.18546289, 0.04257777, 1.20476401,
        1.48403311, 0.17008089, 0.06487759, 0.16780744, 0.23214206,
        2.5722568 , 1.87174428, 0.2541669 , 0.20749982, 0.16315481,
        0.70712435, 0.26381177, 0.49152234, 0.67282563, 0.19267203,
        0.29127747, 0.09682107, 1.21251154, 0.19741221, 0.44512141,
        0.44369081, 0.21676107, 0.36704862, 0.06706504, 1.97674787,
        0.00856015, 0.51626593, 0.21145488, 0.09790635, 0.26357391,
        1.6925323 , 2.10766435, 0.05584541, 0.05150893, 0.24062614,
        0.90479541, 0.1198509 , 0.10030396]), indices=array([ 1037,  2003,  2005,  2190,  2204,  2307,  2338,  2497,  2565,
         2793,  3075,  3177,  3274,  3286,  3430,  3435,  3793,  3819,
         3835,  3989,  4007,  4118,  4248,  4289,  4294,  4322,  4434,
         4667,  4773,  5080,  5371,  5527,  6028,  6581,  6633,  6919,
         6981,  6994,  7809,  7812,  7861,  8270,  9262,  9896, 10472,
        13850, 16602, 23924]))]

The previous output is a SparseEmbedding object for the first document in our list.

It contains two arrays: values and indices. - The 'values' array represents the weights of the features (tokens) in the document. - The 'indices' array represents the indices of these features in the model's vocabulary.

Each pair of corresponding values and indices represents a token and its weight in the document.

This is still a little abstract, so let's use the tokenizer vocab to make sense of these indices.

SparseTextEmbedding.list_supported_models()

[{'model': 'prithvida/Splade_PP_en_v1',
  'vocab_size': 30522,
  'description': 'Misspelled version of the model. Retained for backward compatibility. Independent Implementation of SPLADE++ Model for English',
  'size_in_GB': 0.532,
  'sources': {'hf': 'Qdrant/SPLADE_PP_en_v1'}},
 {'model': 'prithivida/Splade_PP_en_v1',
  'vocab_size': 30522,
  'description': 'Independent Implementation of SPLADE++ Model for English',
  'size_in_GB': 0.532,
  'sources': {'hf': 'Qdrant/SPLADE_PP_en_v1'}}]

def get_tokens_and_weights(sparse_embedding, model_name) -&gt; dict[str, float]:
    # Find the tokenizer for the model
    tokenizer_source = None
    for model_info in SparseTextEmbedding.list_supported_models():
        if model_info["model"].lower() == model_name.lower():
            tokenizer_source = model_info["sources"]["hf"]
            break
        else:
            raise ValueError(f"Model {model_name} not found in the supported models.")

    tokenizer = AutoTokenizer.from_pretrained(tokenizer_source)
    token_weight_dict: dict[str, float] = {}
    for i in range(len(sparse_embedding.indices)):
        token = tokenizer.decode([sparse_embedding.indices[i]])
        weight = sparse_embedding.values[i]
        token_weight_dict[token] = weight

    # Sort the dictionary by weights
    token_weight_dict = dict(
        sorted(token_weight_dict.items(), key=lambda item: item[1], reverse=True)
    )
    return token_weight_dict


# Test the function with the first SparseEmbedding
print(json.dumps(get_tokens_and_weights(sparse_embedding[0], sparse_model_name), indent=4))

{
    "fast": 2.5722568035125732,
    "##bed": 2.1076643466949463,
    "##em": 1.9767478704452515,
    "text": 1.8717442750930786,
    "em": 1.6925323009490967,
    "library": 1.4840331077575684,
    "##ding": 1.2125115394592285,
    "bed": 1.2047640085220337,
    "good": 0.9082595109939575,
    "librarian": 0.9047954082489014,
    "is": 0.8048465847969055,
    "software": 0.7071243524551392,
    "format": 0.6728256344795227,
    "great": 0.613731324672699,
    "texts": 0.5162659287452698,
    "quick": 0.49152234196662903,
    "device": 0.4451214075088501,
    "file": 0.44369080662727356,
    "for": 0.4135623872280121,
    "best": 0.38512513041496277,
    "technique": 0.36704862117767334,
    "facility": 0.2912774682044983,
    "method": 0.26381176710128784,
    "ideal": 0.26357391476631165,
    "perfect": 0.2541669011116028,
    "##bing": 0.24062614142894745,
    "material": 0.23214206099510193,
    "storage": 0.21676106750965118,
    "tool": 0.21145488321781158,
    "nice": 0.20749981701374054,
    "web": 0.19741220772266388,
    "architecture": 0.1926720291376114,
    "##b": 0.18546289205551147,
    "book": 0.18313883244991302,
    "a": 0.17295610904693604,
    "speed": 0.17008088529109955,
    "##am": 0.1678074449300766,
    "##ization": 0.16315481066703796,
    "browser": 0.11985089629888535,
    "##ogen": 0.10030396282672882,
    "database": 0.09790635108947754,
    "connection": 0.09682106971740723,
    "excellent": 0.0670650377869606,
    "computer": 0.06487759202718735,
    "java": 0.055845409631729126,
    "algorithm": 0.051508933305740356,
    "program": 0.04257776960730553,
    "wonderful": 0.00856015458703041
}

Create Dense Embeddings

def make_dense_embedding(texts: list[str]):
    return list(dense_model.embed(texts))


dense_embedding = make_dense_embedding(["Fastembed is a great library for text embeddings!"])

dense_embedding[0].shape

(1024,)

product_texts = df["combined_text"].tolist()

%%time
df["sparse_embedding"] = make_sparse_embedding(product_texts)

CPU times: user 5min 57s, sys: 22 s, total: 6min 19s
Wall time: 1min 37s

Notice that FastEmbed uses data parallelism to speed up the embedding generation process.

This improves throughput and reduces the time it takes to generate embeddings for large datasets.

For our small dataset here, on my local machine -- it reduces the time from user's 6 min 15 seconds to a wall time of about 3 min 6 seconds, or about 2x faster. This is a function of the number of CPU cores available on the machine, CPU usage and other factors -- so your mileage may vary.

df["sparse_embedding"]

0      SparseEmbedding(values=array([0.06509431, 0.57...
2      SparseEmbedding(values=array([0.10595927, 0.20...
3      SparseEmbedding(values=array([0.1140037 , 0.02...
4      SparseEmbedding(values=array([6.13510251e-01, ...
5      SparseEmbedding(values=array([0.90058267, 0.12...
                             ...                        
780    SparseEmbedding(values=array([5.56782305e-01, ...
809    SparseEmbedding(values=array([0.38585788, 0.44...
828    SparseEmbedding(values=array([3.27695787e-01, ...
867    SparseEmbedding(values=array([0.36255798, 0.74...
870    SparseEmbedding(values=array([3.74321818e-01, ...
Name: sparse_embedding, Length: 176, dtype: object

%%time
df["dense_embedding"] = make_dense_embedding(product_texts)

CPU times: user 15min 51s, sys: 31.7 s, total: 16min 23s
Wall time: 3min

Indexing

client = QdrantClient(":memory:")

About Qdrant

Qdrant is a vector similarity search engine that allows you to index and search high-dimensional vectors. It supports both sparse and dense embeddings, and it's a great tool for building search engines.

Here, we use the memory mode which is Numpy under the hood for demonstration purposes. In production, you can use the Docker or Cloud for full DB support.

collection_name = "esci"
client.create_collection(
    collection_name,
    vectors_config={
        "text-dense": VectorParams(
            size=1024,  # OpenAI Embeddings
            distance=Distance.COSINE,
        )
    },
    sparse_vectors_config={
        "text-sparse": SparseVectorParams(
            index=SparseIndexParams(
                on_disk=False,
            )
        )
    },
)

True

def make_points(df: pd.DataFrame) -&gt; list[PointStruct]:
    sparse_vectors = df["sparse_embedding"].tolist()
    product_texts = df["combined_text"].tolist()
    dense_vectors = df["dense_embedding"].tolist()
    rows = df.to_dict(orient="records")
    points = []
    for idx, (text, sparse_vector, dense_vector) in enumerate(
        zip(product_texts, sparse_vectors, dense_vectors)
    ):
        sparse_vector = SparseVector(
            indices=sparse_vector.indices.tolist(), values=sparse_vector.values.tolist()
        )
        point = PointStruct(
            id=idx,
            payload={
                "text": text,
                "product_id": rows[idx]["product_id"],
            },  # Add any additional payload if necessary
            vector={
                "text-sparse": sparse_vector,
                "text-dense": dense_vector.tolist(),
            },
        )
        points.append(point)
    return points


points: list[PointStruct] = make_points(df)

client.upsert(collection_name, points)

UpdateResult(operation_id=0, status=<UpdateStatus.COMPLETED: 'completed'>)

Search

def search(query_text: str):
    # # Compute sparse and dense vectors
    query_sparse_vectors: list[SparseEmbedding] = make_sparse_embedding([query_text])
    query_dense_vector: list[np.ndarray] = make_dense_embedding([query_text])

    search_results = client.search_batch(
        collection_name=collection_name,
        requests=[
            SearchRequest(
                vector=NamedVector(
                    name="text-dense",
                    vector=query_dense_vector[0].tolist(),
                ),
                limit=10,
                with_payload=True,
            ),
            SearchRequest(
                vector=NamedSparseVector(
                    name="text-sparse",
                    vector=SparseVector(
                        indices=query_sparse_vectors[0].indices.tolist(),
                        values=query_sparse_vectors[0].values.tolist(),
                    ),
                ),
                limit=10,
                with_payload=True,
            ),
        ],
    )

    return search_results


query_text = " revent 80 cfm"
search_results = search(query_text)

Ranking

We'll combine the results from the two models using Reciprocal Rank Fusion (RRF). You can read more about RRF here.

We select RRF for this task because: 1. It is a simple and effective method for combining search results. 2. It is robust to the differences in the ranking scores of the two or more ranking lists. 3. It is easy to implement and requires minimal tuning (only one parameter: alpha, which we don't tune here).

def rrf(rank_lists, alpha=60, default_rank=1000):
    """
    Optimized Reciprocal Rank Fusion (RRF) using NumPy for large rank lists.

    :param rank_lists: A list of rank lists. Each rank list should be a list of (item, rank) tuples.
    :param alpha: The parameter alpha used in the RRF formula. Default is 60.
    :param default_rank: The default rank assigned to items not present in a rank list. Default is 1000.
    :return: Sorted list of items based on their RRF scores.
    """
    # Consolidate all unique items from all rank lists
    all_items = set(item for rank_list in rank_lists for item, _ in rank_list)

    # Create a mapping of items to indices
    item_to_index = {item: idx for idx, item in enumerate(all_items)}

    # Initialize a matrix to hold the ranks, filled with the default rank
    rank_matrix = np.full((len(all_items), len(rank_lists)), default_rank)

    # Fill in the actual ranks from the rank lists
    for list_idx, rank_list in enumerate(rank_lists):
        for item, rank in rank_list:
            rank_matrix[item_to_index[item], list_idx] = rank

    # Calculate RRF scores using NumPy operations
    rrf_scores = np.sum(1.0 / (alpha + rank_matrix), axis=1)

    # Sort items based on RRF scores
    sorted_indices = np.argsort(-rrf_scores)  # Negative for descending order

    # Retrieve sorted items
    sorted_items = [(list(item_to_index.keys())[idx], rrf_scores[idx]) for idx in sorted_indices]

    return sorted_items


# Example usage
rank_list1 = [("A", 1), ("B", 2), ("C", 3)]
rank_list2 = [("B", 1), ("C", 2), ("D", 3)]
rank_list3 = [("A", 2), ("D", 1), ("E", 3)]

# Combine the rank lists
sorted_items = rrf([rank_list1, rank_list2, rank_list3])
sorted_items

[('A', 0.033465871107430434),
 ('B', 0.033465871107430434),
 ('D', 0.03320985472238179),
 ('C', 0.03294544435749548),
 ('E', 0.01775980832584606)]

Based on this, let's convert our sparse and dense results into rank lists. And then, we'll use the Reciprocal Rank Fusion (RRF) algorithm to combine them.

def rank_list(search_result: list[ScoredPoint]):
    return [(point.id, rank + 1) for rank, point in enumerate(search_result)]


dense_rank_list, sparse_rank_list = rank_list(search_results[0]), rank_list(search_results[1])

rrf_rank_list = rrf([dense_rank_list, sparse_rank_list])

rrf_rank_list

[(3, 0.032018442622950824),
 (8, 0.03149801587301587),
 (1, 0.03131881575727918),
 (13, 0.030834914611005692),
 (15, 0.030536130536130537),
 (9, 0.030309988518943745),
 (12, 0.030158730158730156),
 (14, 0.029437229437229435),
 (11, 0.028985507246376812),
 (2, 0.01707242848447961),
 (4, 0.01564927857935627)]

def find_point_by_id(
    client: QdrantClient, collection_name: str, rrf_rank_list: list[tuple[int, float]]
):
    return client.retrieve(
        collection_name=collection_name, ids=[item[0] for item in rrf_rank_list]
    )


find_point_by_id(client, collection_name, rrf_rank_list)

[Record(id=3, payload={'text': 'Delta Electronics RAD80L BreezRadiance 80 CFM Heater/Fan/Light Combo White (Renewed)\nDelta Electronics RAD80L BreezRadiance 80 CFM Heater/Fan/Light Combo White (Renewed)\nDELTA ELECTRONICS (AMERICAS) LTD.\nWhite\nThis pre-owned or refurbished product has been professionally inspected and tested to work and look like new. How a product becomes part of Amazon Renewed, your destination for pre-owned, refurbished products: A customer buys a new product and returns it or trades it in for a newer or different model. That product is inspected and tested to work and look like new by Amazon-qualified suppliers. Then, the product is sold as an Amazon Renewed product on Amazon. If not satisfied with the purchase, renewed products are eligible for replacement or refund under the Amazon Renewed Guarantee.\nQuiet operation at 1.5 sones\nBuilt-in thermostat regulates temperature. Energy efficiency at 7.6 CFM/Watt\nPrecision engineered with DC brushless motor for extended reliability, this fan will outlast many household appliances\nGalvanized steel construction resists corrosion\nDuct: Detachable 4-inch Plastic Duct Adapter\nQuiet operation at 1.5 sones\nBuilt-in thermostat regulates temperature. Energy efficiency at 7.6 CFM/Watt\nPrecision engineered with DC brushless motor for extended reliability, this fan will outlast many household appliances\nGalvanized steel construction resists corrosion\nDuct: Detachable 4-inch Plastic Duct Adapter', 'product_id': 'B07RH6Z8KW'}, vector=None, shard_key=None),
 Record(id=8, payload={'text': 'Aero Pure ABF80 L5 W ABF80L5 Ceiling Mount 80 CFM w/LED Light/Nightlight, Energy Star Certified, White Quiet Bathroom Ventilation Fan\nAero Pure ABF80 L5 W ABF80L5 Ceiling Mount 80 CFM w/LED Light/Nightlight, Energy Star Certified, White Quiet Bathroom Ventilation Fan\nAero Pure\nWhite\nNone\nQuiet 0.3 Sones, 80 CFM fan with choice of three designer grilles in White, Satin Nickel, or Oil Rubbed Bronze; Full 6 year warranty\n10W 3000K 800 Lumens LED Light with 0.7W Nightlight included\nInstallation friendly- Quick-mount adjustable metal bracket for new construction and retrofit; 4”, 5: and 6” metal duct adaptor included\nMeets today’s demanding building specifications- ETL Listed for wet application, ENERGY STAR certified, CALGreen, JA-8 Compliant for CA Title 24, and ASHRAE 62.2 compliant\nHousing dimensions- 10 2/5”x10 2/5”x 7 ½”; Grille dimensions- 13”x13”; Fits 2"x8" joists\nQuiet 0.3 Sones, 80 CFM fan with choice of three designer grilles in White, Satin Nickel, or Oil Rubbed Bronze; Full 6 year warranty\n10W 3000K 800 Lumens LED Light with 0.7W Nightlight included\nInstallation friendly- Quick-mount adjustable metal bracket for new construction and retrofit; 4”, 5: and 6” metal duct adaptor included\nMeets today’s demanding building specifications- ETL Listed for wet application, ENERGY STAR certified, CALGreen, JA-8 Compliant for CA Title 24, and ASHRAE 62.2 compliant\nHousing dimensions- 10 2/5”x10 2/5”x 7 ½”; Grille dimensions- 13”x13”; Fits 2"x8" joists', 'product_id': 'B07JY1PQNT'}, vector=None, shard_key=None),
 Record(id=1, payload={'text': "Homewerks 7141-80 Bathroom Fan Integrated LED Light Ceiling Mount Exhaust Ventilation, 1.1 Sones, 80 CFM\nHomewerks 7141-80 Bathroom Fan Integrated LED Light Ceiling Mount Exhaust Ventilation, 1.1 Sones, 80 CFM\nHomewerks\n80 CFM\nNone\nOUTSTANDING PERFORMANCE: This Homewerk's bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1.1 sones at 80 CFM which means it’s able to manage spaces up to 80 square feet and is very quiet..\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a modern style round shape and has an 4000K Cool White Light LED Light. AC motor.\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in. 13 in round grill and 4 in round duct connector.\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited\nOUTSTANDING PERFORMANCE: This Homewerk's bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1.1 sones at 80 CFM which means it’s able to manage spaces up to 80 square feet and is very quiet..\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a modern style round shape and has an 4000K Cool White Light LED Light. AC motor.\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in. 13 in round grill and 4 in round duct connector.\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited", 'product_id': 'B07X3Y6B1V'}, vector=None, shard_key=None),
 Record(id=13, payload={'text': 'Delta BreezSignature VFB25ACH 80 CFM Exhaust Bath Fan with Humidity Sensor\nDelta BreezSignature VFB25ACH 80 CFM Exhaust Bath Fan with Humidity Sensor\nDELTA ELECTRONICS (AMERICAS) LTD.\nWhite\nNone\nVirtually silent at less than 0.3 sones\nPrecision engineered with DC brushless motor for extended reliability\nEasily switch in and out of humidity sensing mode by toggling wall switch\nENERGY STAR qualified for efficient cost-saving operation\nPrecision engineered with DC brushless motor for extended reliability, this fan will outlast many household appliances\nVirtually silent at less than 0.3 sones\nPrecision engineered with DC brushless motor for extended reliability\nEasily switch in and out of humidity sensing mode by toggling wall switch\nENERGY STAR qualified for efficient cost-saving operation\nPrecision engineered with DC brushless motor for extended reliability, this fan will outlast many household appliances', 'product_id': 'B003O0MNGC'}, vector=None, shard_key=None),
 Record(id=15, payload={'text': 'Delta Electronics (Americas) Ltd. GBR80HLED Delta BreezGreenBuilder Series 80 CFM Fan/Dimmable H, LED Light, Dual Speed & Humidity Sensor\nDelta Electronics (Americas) Ltd. GBR80HLED Delta BreezGreenBuilder Series 80 CFM Fan/Dimmable H, LED Light, Dual Speed & Humidity Sensor\nDELTA ELECTRONICS (AMERICAS) LTD.\nWith LED Light, Dual Speed & Humidity Sensor\nNone\nUltra energy-efficient LED module (11-watt equivalent to 60-watt incandescent light) included. Main light output-850 Lumens, 3000K\nExtracts air at a rate of 80 CFM to properly ventilate bathrooms up to 80 sq. Ft., quiet operation at 0.8 sones\nPrecision engineered with DC brushless motor for extended reliability, this Fan will outlast many household appliances\nEnergy Star qualified for efficient cost-saving operation, galvanized steel construction resists corrosion\nFan impeller Stops If obstructed, for safe worry-free operation, attractive grille gives your bathroom a fresh look\nUltra energy-efficient LED module (11-watt equivalent to 60-watt incandescent light) included. Main light output-850 Lumens, 3000K\nExtracts air at a rate of 80 CFM to properly ventilate bathrooms up to 80 sq. Ft., quiet operation at 0.8 sones\nPrecision engineered with DC brushless motor for extended reliability, this Fan will outlast many household appliances\nEnergy Star qualified for efficient cost-saving operation, galvanized steel construction resists corrosion\nFan impeller Stops If obstructed, for safe worry-free operation, attractive grille gives your bathroom a fresh look', 'product_id': 'B01N5Y6002'}, vector=None, shard_key=None),
 Record(id=9, payload={'text': "Delta Electronics (Americas) Ltd. RAD80 Delta BreezRadiance Series 80 CFM Fan with Heater, 10.5W, 1.5 Sones\nDelta Electronics (Americas) Ltd. RAD80 Delta BreezRadiance Series 80 CFM Fan with Heater, 10.5W, 1.5 Sones\nDELTA ELECTRONICS (AMERICAS) LTD.\nWith Heater\nNone\nQuiet operation at 1.5 Sones\nPrecision engineered with DC brushless motor for extended reliability, this Fan will outlast many household appliances\nGalvanized steel construction resists corrosion, equipped with metal duct adapter\nFan impeller Stops If obstructed, for safe worry-free operation\nPeace of mind quality, performance and reliability from the world's largest DC brushless Fan Manufacturer\nQuiet operation at 1.5 Sones\nPrecision engineered with DC brushless motor for extended reliability, this Fan will outlast many household appliances\nGalvanized steel construction resists corrosion, equipped with metal duct adapter\nFan impeller Stops If obstructed, for safe worry-free operation\nPeace of mind quality, performance and reliability from the world's largest DC brushless Fan Manufacturer", 'product_id': 'B01MZIK0PI'}, vector=None, shard_key=None),
 Record(id=12, payload={'text': 'Aero Pure AP80RVLW Super Quiet 80 CFM Recessed Fan/Light Bathroom Ventilation Fan with White Trim Ring\nAero Pure AP80RVLW Super Quiet 80 CFM Recessed Fan/Light Bathroom Ventilation Fan with White Trim Ring\nAero Pure\nWhite\nNone\nSuper quiet 80CFM energy efficient fan virtually disappears into the ceiling leaving only a recessed light in view\nMay be installed over shower when wired to a GFCI breaker and used with a PAR30L 75W (max) CFL\nBulb not included. Accepts any of the following bulbs: 75W Max. PAR30, 14W Max. BR30 LED, or 75W Max. PAR30L (for use over tub/shower.)\nSuper quiet 80CFM energy efficient fan virtually disappears into the ceiling leaving only a recessed light in view\nMay be installed over shower when wired to a GFCI breaker and used with a PAR30L 75W (max) CFL\nBulb not included. Accepts any of the following bulbs: 75W Max. PAR30, 14W Max. BR30 LED, or 75W Max. PAR30L (for use over tub/shower.)', 'product_id': 'B00MARNO5Y'}, vector=None, shard_key=None),
 Record(id=14, payload={'text': 'Broan Very Quiet Ceiling Bathroom Exhaust Fan, ENERGY STAR Certified, 0.3 Sones, 80 CFM\nBroan Very Quiet Ceiling Bathroom Exhaust Fan, ENERGY STAR Certified, 0.3 Sones, 80 CFM\nBroan-NuTone\nWhite\nNone\nHIGH-QUALITY FAN: Very quiet, energy efficient exhaust fan runs on 0. 3 Sones and is motor engineered for continuous operation\nEFFICIENT: Operates at 80 CFM in bathrooms up to 75 sq. ft. for a high-quality performance. Dimmable Capability: Non Dimmable\nEASY INSTALLATION: Fan is easy to install and/or replace existing product for DIY\'ers and needs only 2" x 8" construction space. Can be used over bathtubs or showers when connected to a GFCI protected branch circuit\nFEATURES: Includes hanger bar system for fast, flexible installation for all types of construction and a 6" ducting for superior performance\nCERTIFIED: ENERGY STAR qualified and HVI Certified to ensure the best quality for your home\nHIGH-QUALITY FAN: Very quiet, energy efficient exhaust fan runs on 0. 3 Sones and is motor engineered for continuous operation\nEFFICIENT: Operates at 80 CFM in bathrooms up to 75 sq. ft. for a high-quality performance. Dimmable Capability: Non Dimmable\nEASY INSTALLATION: Fan is easy to install and/or replace existing product for DIY\'ers and needs only 2" x 8" construction space. Can be used over bathtubs or showers when connected to a GFCI protected branch circuit\nFEATURES: Includes hanger bar system for fast, flexible installation for all types of construction and a 6" ducting for superior performance\nCERTIFIED: ENERGY STAR qualified and HVI Certified to ensure the best quality for your home', 'product_id': 'B001E6DMKY'}, vector=None, shard_key=None),
 Record(id=11, payload={'text': 'Panasonic FV-0811VF5 WhisperFit EZ Retrofit Ventilation Fan, 80 or 110 CFM\nPanasonic FV-0811VF5 WhisperFit EZ Retrofit Ventilation Fan, 80 or 110 CFM\nPanasonic\nWhite\nNone\nRetrofit Solution: Ideal for residential remodeling, hotel construction or renovations\nLow Profile: 5-5/8-Inch housing depth fits in a 2 x 6 construction\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 80 or 110 CFM\nFlexible Installation: Comes with Flex-Z Fast bracket for easy, fast and trouble-free installation\nEnergy Star Rated: Delivers powerful airflow without wasting energy\nRetrofit Solution: Ideal for residential remodeling, hotel construction or renovations\nLow Profile: 5-5/8-Inch housing depth fits in a 2 x 6 construction\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 80 or 110 CFM\nFlexible Installation: Comes with Flex-Z Fast bracket for easy, fast and trouble-free installation\nEnergy Star Rated: Delivers powerful airflow without wasting energy', 'product_id': 'B00XBZFWWM'}, vector=None, shard_key=None),
 Record(id=2, payload={'text': 'Homewerks 7140-80 Bathroom Fan Ceiling Mount Exhaust Ventilation, 1.5 Sones, 80 CFM, White\nHomewerks 7140-80 Bathroom Fan Ceiling Mount Exhaust Ventilation, 1.5 Sones, 80 CFM, White\nHomewerks\nWhite\nNone\nOUTSTANDING PERFORMANCE: This Homewerk\'s bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1. 5 sone at 110 CFM which means it’s able to manage spaces up to 110 square feet\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a grille modern style.\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in and a 4" round duct connector.\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited\nOUTSTANDING PERFORMANCE: This Homewerk\'s bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1. 5 sone at 110 CFM which means it’s able to manage spaces up to 110 square feet\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a grille modern style.\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in and a 4" round duct connector.\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited', 'product_id': 'B07WDM7MQQ'}, vector=None, shard_key=None),
 Record(id=4, payload={'text': 'Panasonic FV-08VRE2 Ventilation Fan with Recessed LED (Renewed)\nPanasonic FV-08VRE2 Ventilation Fan with Recessed LED (Renewed)\nPanasonic\nWhite\nNone\nThe design solution for Fan/light combinations\nEnergy Star rated architectural grade recessed Fan/LED light\nQuiet, energy efficient and powerful 80 CFM ventilation hidden above the Ceiling\nLED lamp is dimmable\nBeautiful Lighting with 6-1/2”aperture and advanced luminaire design\nThe design solution for Fan/light combinations\nEnergy Star rated architectural grade recessed Fan/LED light\nQuiet, energy efficient and powerful 80 CFM ventilation hidden above the Ceiling\nLED lamp is dimmable\nBeautiful Lighting with 6-1/2”aperture and advanced luminaire design', 'product_id': 'B07QJ7WYFQ'}, vector=None, shard_key=None)]

Next, let's check the ESCI (Exact, Substitute, Compliment, and Irrelvant) label for the results against the source data.

ids = [item[0] for item in rrf_rank_list]
df[df["query"] == query_text]

for idx in ids:
    print(df.iloc[idx]["esci_label"])

Exact
Exact
Exact
Exact
Exact
Exact
Exact
Exact
Exact
Exact
Exact

This was amazing! We pulled only Exact results with k=10. This is a great result for a small dataset like this with out of the box vectors which are not even fine-tuned for e-Commerce.

len(rrf_rank_list)

Conclusion

In this notebook, we demonstrated the usage of Hybrid Search with FastEmbed & Qdrant. We used FastEmbed to create Sparse and Dense embeddings for the data and indexed them using Qdrant. We then performed Hybrid Search using FastEmbed & Qdrant and ranked the search results using Reciprocal Rank Fusion (RRF).