Model Overview

SigLIP model pre-trained on WebLi at resolution 224x224. It was introduced in the paper Sigmoid Loss for Language Image Pre-Training by Zhai et al. and first released in this repository. SigLIP is CLIP, a multimodal model, with a better loss function. The sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization. This allows further scaling up the batch size, while also performing better at smaller batch sizes. A TLDR of SigLIP by one of the authors can be found here.

Weights are released under the Apache 2 License . Keras model code is released under the Apache 2 License.

Links

Installation

Keras and KerasHub can be installed with:

pip install -U -q keras-hub
pip install -U -q keras

Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the Keras Getting Started page.

Presets

The following model checkpoints are provided by the Keras team. Full code examples for each are available below.

Preset name Parameters Description
siglip_base_patch16_224 203.16M 200 million parameter, image size 224, pre-trained on WebLi.
siglip_base_patch16_256 203.20M 200 million parameter, image size 256, pre-trained on WebLi.
siglip_base_patch16_384 203.45M 200 million parameter, image size 384, pre-trained on WebLi.
siglip_base_patch16_512 203.79M 200 million parameter, image size 512, pre-trained on WebLi.
siglip_base_patch16_256_multilingual 370.63M 370 million parameter, image size 256, pre-trained on WebLi.
siglip2_base_patch16_224 375.19M 375 million parameter, patch size 16, image size 224, pre-trained on WebLi.
siglip2_base_patch16_256 375.23M 375 million parameter, patch size 16, image size 256, pre-trained on WebLi.
siglip2_base_patch32_256 376.86M 376 million parameter, patch size 32, image size 256, pre-trained on WebLi.
siglip2_base_patch16_384 376.86M 376 million parameter, patch size 16, image size 384, pre-trained on WebLi.
siglip_large_patch16_256 652.15M 652 million parameter, image size 256, pre-trained on WebLi.
siglip_large_patch16_384 652.48M 652 million parameter, image size 384, pre-trained on WebLi.
siglip_so400m_patch14_224 877.36M 877 million parameter, image size 224, shape-optimized version, pre-trained on WebLi.
siglip_so400m_patch14_384 877.96M 877 million parameter, image size 384, shape-optimized version, pre-trained on WebLi.
siglip2_large_patch16_256 881.53M 881 million parameter, patch size 16, image size 256, pre-trained on WebLi.
siglip2_large_patch16_384 881.86M 881 million parameter, patch size 16, image size 384, pre-trained on WebLi.
siglip2_large_patch16_512 882.31M 882 million parameter, patch size 16, image size 512, pre-trained on WebLi.
siglip_so400m_patch16_256_i18n 1.13B 1.1 billion parameter, image size 256, shape-optimized version, pre-trained on WebLi.
siglip2_so400m_patch14_224 1.14B 1.1 billion parameter, patch size 14, image size 224, shape-optimized version, pre-trained on WebLi.
siglip2_so400m_patch16_256 1.14B 1.1 billion parameter, patch size 16, image size 256, shape-optimized version, pre-trained on WebLi.
siglip2_so400m_patch14_384 1.14B 1.1 billion parameter, patch size 14, image size 224, shape-optimized version, pre-trained on WebLi.
siglip2_so400m_patch16_384 1.14B 1.1 billion parameter, patch size 16, image size 384, shape-optimized version, pre-trained on WebLi.
siglip2_so400m_patch16_512 1.14B 1.1 billion parameter, patch size 16, image size 512, shape-optimized version, pre-trained on WebLi.
siglip2_giant_opt_patch16_256 1.87B 1.8 billion parameter, patch size 16, image size 256, pre-trained on WebLi.
siglip2_giant_opt_patch16_384 1.87B 1.8 billion parameter, patch size 16, image size 384, pre-trained on WebLi.

Example Usage

import keras
import numpy as np
import matplotlib.pyplot as plt
from keras_hub.models import SigLIPBackbone, SigLIPTokenizer
from keras_hub.layers import SigLIPImageConverter

# instantiate the model and preprocessing tools
siglip = SigLIPBackbone.from_preset("siglip_large_patch16_256")
tokenizer = SigLIPTokenizer.from_preset("siglip_large_patch16_256",
sequence_length=64)
image_converter = SigLIPImageConverter.from_preset("siglip_large_patch16_256")

# obtain tokens for some input text
tokens = tokenizer.tokenize(["mountains", "cat on tortoise", "house"])

# preprocess image and text
image = keras.utils.load_img("cat.jpg")
image = image_converter(np.array([image]).astype(float))

# query the model for similarities
siglip({
     "images": image,
     "token_ids": tokens,
})

Example Usage with Hugging Face URI

import keras
import numpy as np
import matplotlib.pyplot as plt
from keras_hub.models import SigLIPBackbone, SigLIPTokenizer
from keras_hub.layers import SigLIPImageConverter

# instantiate the model and preprocessing tools
siglip = SigLIPBackbone.from_preset("hf://keras/siglip_large_patch16_256")
tokenizer = SigLIPTokenizer.from_preset("hf://keras/siglip_large_patch16_256",
sequence_length=64)
image_converter = SigLIPImageConverter.from_preset("hf://keras/siglip_large_patch16_256")

# obtain tokens for some input text
tokens = tokenizer.tokenize(["mountains", "cat on tortoise", "house"])

# preprocess image and text
image = keras.utils.load_img("cat.jpg")
image = image_converter(np.array([image]).astype(float))

# query the model for similarities
siglip({
     "images": image,
     "token_ids": tokens,
})
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Collection including keras/siglip_large_patch16_256