metadata

base_model:
  - OpenGVLab/InternVL-Chat-V1-2
language:
  - en
pipeline_tag: image-text-to-text
library_name: transformers
tags:
  - medical

MedRegA

Model for paper "Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks".

🌐 Project Page: https://medrega.github.io/

📄 Paper: https://arxiv.org/abs/2410.18387

💻 Code: https://github.com/xmed-lab/MedRegA

Introduction

We propose a Region-Aware medical MLLM, MedRegA, which is the first bilingual generalist medical AI system to simultaneously handle image-level and region-level medical vision-language tasks across a broad range of modalities.

Our MedRegA not only enables three region-centric tasks, but also achieves the best performance for visual question answering, report generation and medical image classification over 8 modalities, showcasing significant versatility.

Citation

@article{wang2024interpretable,
  title={Interpretable bilingual multimodal large language model for diverse biomedical tasks},
  author={Wang, Lehan and Wang, Haonan and Yang, Honglong and Mao, Jiaji and Yang, Zehong and Shen, Jun and Li, Xiaomeng},
  journal={arXiv preprint arXiv:2410.18387},
  year={2024}
}