Skip to content

FastEmbed on GPU

As of version 0.2.7 FastEmbed supports GPU acceleration.

This notebook covers the installation process and usage of fastembed on GPU.

Installation

Fastembed depends on onnxruntime and inherits its scheme of GPU support.

In order to use GPU with onnx models, you would need to have onnxruntime-gpu package, which substitutes all the onnxruntime functionality. Fastembed mimics this behavior and requires fastembed-gpu package to be installed.

!pip install fastembed-gpu

NOTE: onnxruntime-gpu and onnxruntime can't be installed in the same environment. If you have onnxruntime installed, you would need to uninstall it before installing onnxruntime-gpu. Same is true for fastembed and fastembed-gpu.

CUDA 12.x support

By default onnxruntime-gpu is shipped with CUDA 11.8 support. CUDA 12.x support requires installation of onnxruntime-gpu with providing of a direct url:

!pip install onnxruntime-gpu -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/ -qq
!pip install fastembed-gpu -qqq

You can check your CUDA version using such commands as nvidia-smi or nvcc --version

Google Colab notebooks have CUDA 12.x.

CUDA drivers

FastEmbed does not include CUDA drivers and CuDNN libraries. You would need to take care of the environment setup on your own. The dependencies required for the chosen onnxruntime version are listed in the CUDA Execution Provider requirements.

Setting up fastembed-gpu on GCP

CUDA drivers

CUDA 11.8 toolkit or CUDA 12.x toolkit has to be installed if they haven't yet been set up.

Example of setting up CUDA 12.x on Ubuntu 22.04

Make sure to download an archive which has been created for your particular platform, CPU architecture and OS distribution.

For Ubuntu 22.04 with x86_64 CPU architecture the following archive has to be downloaded.

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda
NOTE: Specific CUDA libraries can be found in the meta packages section in the CUDA installation guide.

NOTE: When installing CUDA, the environment variable might not be set by default. Make sure to add the following line to your environment variables:

LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
This will ensure that the CUDA libraries are properly linked.

CuDNN 9.x

CuDNN 9.x library can be installed via the following archive.

Example of setting up CuDNN 9.x on Ubuntu 22.04

CuDNN 9.x for Ubuntu 22.04 x86_64 archive can be downloaded and installed in the following way:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cudnn
NOTE: When installing CuDNN, you can choose specific version, cudnn-cuda-11 or cudnn-cuda-12

Common issues

The following are some common issues that may arise while using fastembed-gpu if not installed properly:

CUDA library is not installed:

FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.x: cannot open shared object file: No such file or directory

CuDNN library is not installed:

FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.x: cannot open shared object file: No such file or directory

CUDA library path is not set:

FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcufft.so.x: failed to map segment from shared object

Make sure to add the following line to your environment variables:

LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

Usage

from typing import List

import numpy as np

from fastembed import TextEmbedding

embedding_model_gpu = TextEmbedding(
    model_name="BAAI/bge-small-en-v1.5", providers=["CUDAExecutionProvider"]
)
embedding_model_gpu.model.model.get_providers()
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(

Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]
tokenizer_config.json:   0%|          | 0.00/1.24k [00:00<?, ?B/s]
config.json:   0%|          | 0.00/706 [00:00<?, ?B/s]
special_tokens_map.json:   0%|          | 0.00/695 [00:00<?, ?B/s]
tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]
model_optimized.onnx:   0%|          | 0.00/66.5M [00:00<?, ?B/s]
['CUDAExecutionProvider', 'CPUExecutionProvider']
documents: List[str] = list(np.repeat("Demonstrating GPU acceleration in fastembed", 500))
%%timeit
list(embedding_model_gpu.embed(documents))
43.4 ms ± 2.06 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

embedding_model_cpu = TextEmbedding(model_name="BAAI/bge-small-en-v1.5")
embedding_model_cpu.model.model.get_providers()
Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]
['CPUExecutionProvider']
%%timeit
list(embedding_model_cpu.embed(documents))
4.33 s ± 591 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)