Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Experiments on four vision-and-language tasks including VQA, VCR, NLVR2,
and Flickr30K show that VisualBERT outperforms or rivals with state-of-the-art models while being significantly
simpler.