license: mit | |
pipeline_tag: image-text-to-text | |
This model is in the paper [Docopilot: Improving Multimodal Models for Document-Level Understanding](https://openaccess.thecvf.com/content/CVPR2025/html/Duan_Docopilot_Improving_Multimodal_Models_for_Document-Level_Understanding_CVPR_2025_paper.html). | |
Please refer to https://github.com/OpenGVLab/Docopilot for details. |