5fa1a76
1
2
3
We also show the generalizability of our pretrained cross-modality model by adapting it to a challenging visual-reasoning task, NLVR, and improve the previous best result by 22% absolute (54% to 76%).