Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Specifically, LayoutLMv2 not only uses the existing masked
visual-language modeling task but also the new text-image alignment and text-image matching tasks in the pre-training
stage, where cross-modality interaction is better learned.