The effectiveness of initializing image-to-text-sequence models with pretrained checkpoints has been shown in (for | |
example) TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, | |
Zhoujun Li, Furu Wei. |