MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation

Preprint

Khai Le-Duc*, Tuyen Tran*,

Bach Phan Tat, Nguyen Kim Hai Bui, Quan Dang, Hung-Phong Tran, Thanh-Thuy Nguyen, Ly Nguyen, Tuan-Minh Phan, Thi Thu Phuong Tran, Chris Ngo,

Nguyen X. Khanh**, Thanh Nguyen-Tang**

*Equal contribution

**Equal supervision

Abstract: Multilingual speech translation (ST) in the medical domain enhances patient care by enabling efficient communication across language barriers, alleviating specialized workforce shortages, and facilitating improved diagnosis and treatment, particularly during pandemics. In this work, we present the first systematic study on medical ST, to our best knowledge, by releasing MultiMed-ST, a large-scale ST dataset for the medical domain, spanning all translation directions in five languages: Vietnamese, English, German, French, Traditional Chinese and Simplified Chinese, together with the models. With 290,000 samples, our dataset is the largest medical machine translation (MT) dataset and the largest many-to-many multilingual ST among all domains. Secondly, we present the most extensive analysis study in ST research to date, including: empirical baselines, bilingual-multilingual comparative study, end-to-end vs. cascaded comparative study, task-specific vs. multi-task sequence-to-sequence (seq2seq) comparative study, code-switch analysis, and quantitative-qualitative error analysis. All code, data, and models are available online: https://github.com/leduckhai/MultiMed-ST.

Please press ⭐ button and/or cite papers if you feel helpful.

GitHub: https://github.com/leduckhai/MultiMed-ST
Citation: Please cite this paper: https://arxiv.org/abs/2504.03546

@article{le2025multimedst,
  title={MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation},
  author={Le-Duc, Khai and Tran, Tuyen and Tat, Bach Phan and Bui, Nguyen Kim Hai and Dang, Quan and Tran, Hung-Phong and Nguyen, Thanh-Thuy and Nguyen, Ly and Phan, Tuan-Minh and Tran, Thi Thu Phuong and others},
  journal={arXiv preprint arXiv:2504.03546},
  year={2025}
}

Dataset and Models:

Dataset: HuggingFace dataset

Fine-tuned models: HuggingFace models

Contact:

Core developers:

Khai Le-Duc

University of Toronto, Canada
Email: duckhai.le@mail.utoronto.ca
GitHub: https://github.com/leduckhai

Tuyen Tran

Hanoi University of Science and Technology, Vietnam
Email: tuyencbt@gmail.com

Bui Nguyen Kim Hai

Eötvös Loránd University, Hungary
Email: htlulem185@gmail.com

leduckhai
/

MultiMed-ST

MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation

Dataset and Models:

Contact:

Model tree for leduckhai/MultiMed-ST

Datasets used to train leduckhai/MultiMed-ST