FastEmbed on GPU
As of version 0.2.7 FastEmbed supports GPU acceleration.
This notebook covers the installation process and usage of fastembed on GPU.
Installation
Fastembed depends on onnxruntime
and inherits its scheme of GPU support.
In order to use GPU with onnx models, you would need to have onnxruntime-gpu
package, which substitutes all the onnxruntime
functionality.
Fastembed mimics this behavior and requires fastembed-gpu
package to be installed.
!pip install fastembed-gpu
NOTE: onnxruntime-gpu
and onnxruntime
can't be installed in the same environment. If you have onnxruntime
installed, you would need to uninstall it before installing onnxruntime-gpu
. Same is true for fastembed
and fastembed-gpu
.
CUDA 12.x support
You can check your CUDA version using such commands as nvidia-smi
or nvcc --version
Starting from version 1.19.0, onnxruntime-gpu ships with support for CUDA 12.x by default.
Google Colab notebooks have by default CUDA 12.x and CuDNN 8.x.
Latest version of onnxruntime-gpu
requires CuDNN 9.x, in order to install it you can run the following command:
!sudo apt install cudnn9
!pip install fastembed-gpu -qqq
If it necessary to work with CuDNN 8, you can consider locking onnxruntime-gpu
to 1.18.0 with CUDA 12.x by this command:
!pip install onnxruntime-gpu==1.18.0 -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/ -qq
!pip install fastembed-gpu -qqq
CUDA 11.x support
To use latest version of onnxruntime-gpu
with CUDA 11.x, you can run the following command:
!pip install onnxruntime-gpu -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-11/pypi/simple/ -qq
NOTE: Ensure that CuDNN 9.x is installed when working with the latest onnxruntime-gpu
, whether using CUDA 11.x or 12.x.
CUDA drivers
FastEmbed does not include CUDA drivers and CuDNN libraries. You would need to take care of the environment setup on your own. The dependencies required for the chosen onnxruntime version are listed in the CUDA Execution Provider requirements.
Setting up fastembed-gpu on GCP
CUDA drivers
CUDA 11.8 toolkit or CUDA 12.x toolkit has to be installed if they haven't yet been set up.
Example of setting up CUDA 12.x on Ubuntu 22.04
Make sure to download an archive which has been created for your particular platform, CPU architecture and OS distribution.
For Ubuntu 22.04 with x86_64 CPU architecture the following archive has to be downloaded.
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda
NOTE: When installing CUDA, the environment variable might not be set by default. Make sure to add the following line to your environment variables:
LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
CuDNN 9.x
CuDNN 9.x library can be installed via the following archive.
Example of setting up CuDNN 9.x on Ubuntu 22.04
CuDNN 9.x for Ubuntu 22.04 x86_64 archive can be downloaded and installed in the following way:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cudnn
Common issues
The following are some common issues that may arise while using fastembed-gpu
if not installed properly:
CUDA library is not installed:
FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.x: cannot open shared object file: No such file or directory
CuDNN library is not installed:
FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.x: cannot open shared object file: No such file or directory
CUDA library path is not set:
FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcufft.so.x: failed to map segment from shared object
Make sure to add the following line to your environment variables:
LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
Usage
import numpy as np
from fastembed import TextEmbedding
embedding_model_gpu = TextEmbedding(
model_name="BAAI/bge-small-en-v1.5", providers=["CUDAExecutionProvider"]
)
embedding_model_gpu.model.model.get_providers()
documents: list[str] = list(np.repeat("Demonstrating GPU acceleration in fastembed", 500))
%%timeit
list(embedding_model_gpu.embed(documents))
embedding_model_cpu = TextEmbedding(model_name="BAAI/bge-small-en-v1.5")
embedding_model_cpu.model.model.get_providers()
%%timeit
list(embedding_model_cpu.embed(documents))