--- license: apache-2.0 language: - en base_model: - google/siglip-base-patch16-256-multilingual - Qwen/Qwen2.5-3B-Instruct pipeline_tag: image-text-to-text tags: - 2D_Medical_LVLMs --- # MedM-VL-2D-3B-en ## Introduction A 2D medical LVLM trained on **2D** medical images and **English** medical texts, enabling tasks such as **report generation**, **VQA**, referring expression comprehension (**REC**), referring expression generation (**REG**) and **image classification**. | | Config | | :--- | :---: | | Image encoder | google/siglip-base-patch16-256-multilingual | | Connector | MLP (2-layer) | | LLM | Qwen/Qwen2.5-3B-Instruct | | Image resolution | 256*256 | | Sequence length | 2048 | ## Evaluation | Benchmark | Med-Flamingo | LLaVA-Med | RadFM |**MedM-VL-2D-3B-en** | | :--- | :---: | :---: | :---: | :---: | | MedMNISTderma | 0.012 | 0.258 | 0.051 | **0.786** | | MedMNISTorgan | 0.089 | 0.668 | 0.189 | **0.808** | | MedPix | 0.081 | **0.151** | - | 0.126 | | MIMIC-CXR | **0.233** | 0.204 | 0.068 | 0.199 | | PathVQA | 0.334 | 0.378 | 0.248 | **0.634** | | SAMedidentify | - | 0.458 | - | **0.693** | | SAMedrefer | - | 0.086 | - | **0.235** | | SLAKEidentify | - | 0.272 | - | **0.727** | | SLAKErefer | - | 0.041 | - | **0.313** | | SLAKEvqa | 0.215 | 0.337 | 0.817 | **0.841** | ## Quickstart Please refer to [MedM-VL](https://github.com/MSIIP/MedM-VL). ## Citation ``` bibtex @article{shi2025medm, title={MedM-VL: What Makes a Good Medical LVLM?}, author={Shi, Yiming and Yang, Shaoshuai and Zhu, Xun and Wang, Haoyu and Li, Miao and Wu, Ji}, journal={arXiv preprint arXiv:2504.04323}, year={2025} } ```