---
base_model:
- OpenGVLab/InternVL-Chat-V1-2
language:
- en
pipeline_tag: image-text-to-text
library_name: transformers
tags:
- medical
---

# MedRegA

Model for paper "[Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks](https://huggingface.co/papers/2410.18387)".

🌐 Project Page: [https://medrega.github.io/](https://medrega.github.io/)

📄 Paper: [https://arxiv.org/abs/2410.18387](https://arxiv.org/abs/2410.18387)

💻 Code: [https://github.com/xmed-lab/MedRegA](https://github.com/xmed-lab/MedRegA)

## Introduction

We propose a **Region-Aware medical MLLM**, **MedRegA**, which is the first bilingual generalist medical AI system to simultaneously handle image-level and region-level medical vision-language tasks across a broad range of modalities. 

Our MedRegA not only enables three region-centric tasks, but also achieves the best performance for visual question answering, report generation and medical image classification over 8 modalities, showcasing significant versatility.

![medrega.png](https://cdn-uploads.huggingface.co/production/uploads/65156d6ffccbf319e636279b/x4zUYvaPPjDEdm_NdiE-V.png)

## Citation
```
@article{wang2024interpretable,
  title={Interpretable bilingual multimodal large language model for diverse biomedical tasks},
  author={Wang, Lehan and Wang, Haonan and Yang, Honglong and Mao, Jiaji and Yang, Zehong and Shen, Jun and Li, Xiaomeng},
  journal={arXiv preprint arXiv:2410.18387},
  year={2024}
}
```