Skip to content

Image Embedding

As of version 0.3.0 fastembed supports computation of image embeddings.

The process is as easy and straightforward as with text embeddings. Let's see how it works.

from fastembed import ImageEmbedding

model = ImageEmbedding("Qdrant/resnet50-onnx")

embeddings_generator = model.embed(
    ["../../tests/misc/image.jpeg", "../../tests/misc/small_image.jpeg"]
)
embeddings_list = list(embeddings_generator)
embeddings_list
Fetching 3 files: 100%|██████████| 3/3 [00:00<00:00, 47482.69it/s]

[array([0.        , 0.        , 0.        , ..., 0.        , 0.01139933,
        0.        ], dtype=float32),
 array([0.02169187, 0.        , 0.        , ..., 0.        , 0.00848291,
        0.        ], dtype=float32)]

Preprocessing

Preprocessing is encapsulated in the ImageEmbedding class, applied operations are identical to the ones provided by Hugging Face Transformers. You don't need to think about batching, opening/closing files, resizing images, etc., Fastembed will take care of it.

Supported models

List of supported image embedding models can either be found here or by calling the ImageEmbedding.list_supported_models() method.

ImageEmbedding.list_supported_models()
[{'model': 'Qdrant/clip-ViT-B-32-vision',
  'dim': 512,
  'description': 'CLIP vision encoder based on ViT-B/32',
  'size_in_GB': 0.34,
  'sources': {'hf': 'Qdrant/clip-ViT-B-32-vision'},
  'model_file': 'model.onnx'},
 {'model': 'Qdrant/resnet50-onnx',
  'dim': 2048,
  'description': 'ResNet-50 from `Deep Residual Learning for Image Recognition <https://arxiv.org/abs/1512.03385>`__.',
  'size_in_GB': 0.1,
  'sources': {'hf': 'Qdrant/resnet50-onnx'},
  'model_file': 'model.onnx'}]