Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
We also show the generalizability of our
pretrained cross-modality model by adapting it to a challenging visual-reasoning task, NLVR, and improve the previous
best result by 22% absolute (54% to 76%).