Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
This example script shows how to train a CLIP-like vision-text dual encoder model using a pre-trained vision and text encoder using COCO dataset.