UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model Paper ⢠2310.05126 ⢠Published Oct 8, 2023 ⢠1
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding Paper ⢠2307.02499 ⢠Published Jul 4, 2023 ⢠15
Explore and Tell: Embodied Visual Captioning in 3D Environments Paper ⢠2308.10447 ⢠Published Aug 21, 2023
TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning Paper ⢠2404.16635 ⢠Published Apr 25, 2024 ⢠2
mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model Paper ⢠2311.18248 ⢠Published Nov 30, 2023
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding Paper ⢠2409.03420 ⢠Published Sep 5, 2024 ⢠27
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models Paper ⢠2408.04840 ⢠Published Aug 9, 2024 ⢠35