5fa1a76
1
2
After fine-tuning from our pretrained parameters, our model achieves the state-of-the-art results on two visual question answering datasets (i.e., VQA and GQA).