Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
The authors first train CLIP from scratch and fine-tune it end-to-end with the classification and box heads on standard detection datasets using a bipartite matching loss.