MPLT Pretrained model

This model is the pretrained MPLT model, on the masked language modeling, masked image modeling and word patch alignement task inspired by LayoutLMV3.

The training repo is available here.

The model was pretrained during 3 days using 4 A100 GPUs, over 5 epochs with a batch size of 2048 samples, with a linear warmup during 500k steps, then a linear decrease until 20 epochs (which is never reached).

The raw and preprocessed PDF documents and training data are available here.

Curves are available in the tab Training Metrics.