FastEmbed on GPU
As of version 0.2.7 FastEmbed supports GPU acceleration.
This notebook covers the installation process and usage of fastembed on GPU.
Installation
Fastembed depends on onnxruntime
and inherits its scheme of GPU support.
In order to use GPU with onnx models, you would need to have onnxruntime-gpu
package, which substitutes all the onnxruntime
functionality.
Fastembed mimics this behavior and requires fastembed-gpu
package to be installed.
!pip install fastembed-gpu
NOTE: onnxruntime-gpu
and onnxruntime
can't be installed in the same environment. If you have onnxruntime
installed, you would need to uninstall it before installing onnxruntime-gpu
. Same is true for fastembed
and fastembed-gpu
.
CUDA 12.x support
By default onnxruntime-gpu
is shipped with CUDA 11.8 support.
CUDA 12.x support requires installation of onnxruntime-gpu
with providing of a direct url:
!pip install onnxruntime-gpu -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/ -qq
!pip install fastembed-gpu -qqq
You can check your CUDA version using such commands as nvidia-smi
or nvcc --version
Google Colab notebooks have CUDA 12.x.
CUDA drivers
FastEmbed does not include CUDA drivers and CuDNN libraries. You would need to take care of the environment setup on your own. Dependencies required for the chosen onnxruntime version can be found here
Usage
from typing import List
import numpy as np
from fastembed import TextEmbedding
embedding_model_gpu = TextEmbedding(
model_name="BAAI/bge-small-en-v1.5", providers=["CUDAExecutionProvider"]
)
embedding_model_gpu.model.model.get_providers()
documents: List[str] = list(np.repeat("Demonstrating GPU acceleration in fastembed", 500))
%%timeit
list(embedding_model_gpu.embed(documents))
embedding_model_cpu = TextEmbedding(model_name="BAAI/bge-small-en-v1.5")
embedding_model_cpu.model.model.get_providers()
%%timeit
list(embedding_model_cpu.embed(documents))