Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Additionally, LayoutLMv3 is pre-trained with a word-patch alignment objective to learn cross-modal alignment by predicting whether the corresponding image patch of a text word is masked.