Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
For the VCR task, the authors use a fine-tuned detector for generating visual embeddings, for all the checkpoints.