After fine-tuning from our pretrained parameters, our model achieves the state-of-the-art | |
results on two visual question answering datasets (i.e., VQA and GQA). |
After fine-tuning from our pretrained parameters, our model achieves the state-of-the-art | |
results on two visual question answering datasets (i.e., VQA and GQA). |