This notebook demonstrates how to fine-tune SigLIP 2, a robust multilingual vision-language model, for single-label image classification tasks. The fine-tuning process incorporates advanced techniques such as captioning-based pretraining, self-distillation, and masked prediction, unified within a streamlined training pipeline. The workflow supports datasets in both structured and unstructured forms, making it adaptable to various domains and resource levels.
Notebook Name | Description | Notebook Link |
---|---|---|
notebook-siglip2-finetune-type1 | Train/Test Splits | ⬇️Download |
notebook-siglip2-finetune-type2 | Only Train Split | ⬇️Download |
To avoid notebook loading errors, please download and use the notebook.
The notebook outlines two data handling scenarios. In the first, datasets include predefined train and test splits, enabling conventional supervised learning and generalization evaluation. In the second scenario, only a training split is available; in such cases, the training set is either partially reserved for validation or reused entirely for evaluation. This flexibility supports experimentation in constrained or domain-specific settings, where standard test annotations may not exist.
last updated : jul 2025