--- base_model: - OpenGVLab/InternVL-Chat-V1-2 language: - en pipeline_tag: image-text-to-text library_name: transformers tags: - medical --- # MedRegA Model for paper "[Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks](https://huggingface.co/papers/2410.18387)". 🌐 Project Page: [https://medrega.github.io/](https://medrega.github.io/) 📄 Paper: [https://arxiv.org/abs/2410.18387](https://arxiv.org/abs/2410.18387) 💻 Code: [https://github.com/xmed-lab/MedRegA](https://github.com/xmed-lab/MedRegA) ## Introduction We propose a **Region-Aware medical MLLM**, **MedRegA**, which is the first bilingual generalist medical AI system to simultaneously handle image-level and region-level medical vision-language tasks across a broad range of modalities. Our MedRegA not only enables three region-centric tasks, but also achieves the best performance for visual question answering, report generation and medical image classification over 8 modalities, showcasing significant versatility. ![medrega.png](https://cdn-uploads.huggingface.co/production/uploads/65156d6ffccbf319e636279b/x4zUYvaPPjDEdm_NdiE-V.png) ## Citation ``` @article{wang2024interpretable, title={Interpretable bilingual multimodal large language model for diverse biomedical tasks}, author={Wang, Lehan and Wang, Haonan and Yang, Honglong and Mao, Jiaji and Yang, Zehong and Shen, Jun and Li, Xiaomeng}, journal={arXiv preprint arXiv:2410.18387}, year={2024} } ```