MedM-VL
Collection
Model weights for 2D/3D medical LVLMs
•
3 items
•
Updated
•
1
A 2D medical LVLM trained on 2D medical images and English medical texts, enabling tasks such as report generation, VQA, referring expression comprehension (REC), referring expression generation (REG) and image classification.
Config | |
---|---|
Image encoder | google/siglip-base-patch16-256-multilingual |
Connector | MLP (2-layer) |
LLM | Qwen/Qwen2.5-3B-Instruct |
Image resolution | 256*256 |
Sequence length | 2048 |
Benchmark | Med-Flamingo | LLaVA-Med | RadFM | MedM-VL-2D-3B-en |
---|---|---|---|---|
MedMNISTderma | 0.012 | 0.258 | 0.051 | 0.786 |
MedMNISTorgan | 0.089 | 0.668 | 0.189 | 0.808 |
MedPix | 0.081 | 0.151 | - | 0.126 |
MIMIC-CXR | 0.233 | 0.204 | 0.068 | 0.199 |
PathVQA | 0.334 | 0.378 | 0.248 | 0.634 |
SAMedidentify | - | 0.458 | - | 0.693 |
SAMedrefer | - | 0.086 | - | 0.235 |
SLAKEidentify | - | 0.272 | - | 0.727 |
SLAKErefer | - | 0.041 | - | 0.313 |
SLAKEvqa | 0.215 | 0.337 | 0.817 | 0.841 |
Please refer to MedM-VL.
@article{shi2025medm,
title={MedM-VL: What Makes a Good Medical LVLM?},
author={Shi, Yiming and Yang, Shaoshuai and Zhu, Xun and Wang, Haoyu and Li, Miao and Wu, Ji},
journal={arXiv preprint arXiv:2504.04323},
year={2025}
}