---
license: apache-2.0
language:
- en
base_model:
- google/siglip-base-patch16-256-multilingual
- Qwen/Qwen2.5-3B-Instruct
pipeline_tag: image-text-to-text
tags:
- 2D_Medical_LVLMs
---
# MedM-VL-2D-3B-en
## Introduction
A 2D medical LVLM trained on **2D** medical images and **English** medical texts, enabling tasks such as **report generation**, **VQA**, referring expression comprehension (**REC**), referring expression generation (**REG**) and **image classification**.
| | Config |
| :--- | :---: |
| Image encoder | google/siglip-base-patch16-256-multilingual |
| Connector | MLP (2-layer) |
| LLM | Qwen/Qwen2.5-3B-Instruct |
| Image resolution | 256*256 |
| Sequence length | 2048 |
## Evaluation
| Benchmark | Med-Flamingo | LLaVA-Med | RadFM |**MedM-VL-2D-3B-en** |
| :--- | :---: | :---: | :---: | :---: |
| MedMNISTderma | 0.012 | 0.258 | 0.051 | **0.786** |
| MedMNISTorgan | 0.089 | 0.668 | 0.189 | **0.808** |
| MedPix | 0.081 | **0.151** | - | 0.126 |
| MIMIC-CXR | **0.233** | 0.204 | 0.068 | 0.199 |
| PathVQA | 0.334 | 0.378 | 0.248 | **0.634** |
| SAMedidentify | - | 0.458 | - | **0.693** |
| SAMedrefer | - | 0.086 | - | **0.235** |
| SLAKEidentify | - | 0.272 | - | **0.727** |
| SLAKErefer | - | 0.041 | - | **0.313** |
| SLAKEvqa | 0.215 | 0.337 | 0.817 | **0.841** |
## Quickstart
Please refer to [MedM-VL](https://github.com/MSIIP/MedM-VL).
## Citation
``` bibtex
@article{shi2025medm,
title={MedM-VL: What Makes a Good Medical LVLM?},
author={Shi, Yiming and Yang, Shaoshuai and Zhu, Xun and Wang, Haoyu and Li, Miao and Wu, Ji},
journal={arXiv preprint arXiv:2504.04323},
year={2025}
}
```