In LXMERT, we | |
build a large-scale Transformer model that consists of three encoders: an object relationship encoder, a language | |
encoder, and a cross-modality encoder. |
In LXMERT, we | |
build a large-scale Transformer model that consists of three encoders: an object relationship encoder, a language | |
encoder, and a cross-modality encoder. |