Specifically, LayoutLMv2 not only uses the existing masked | |
visual-language modeling task but also the new text-image alignment and text-image matching tasks in the pre-training | |
stage, where cross-modality interaction is better learned. |
Specifically, LayoutLMv2 not only uses the existing masked | |
visual-language modeling task but also the new text-image alignment and text-image matching tasks in the pre-training | |
stage, where cross-modality interaction is better learned. |