Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Both the language hidden states and the visual hidden states that LXMERT outputs are passed through the
cross-modality layer, so they contain information from both modalities.