DiT applies the self-supervised objective of BEiT (BERT pre-training of Image Transformers) to 42 million document images, allowing for state-of-the-art results on tasks including: | |
document image classification: the RVL-CDIP dataset (a collection of | |
400,000 images belonging to one of 16 classes). |