--- language: vie datasets: - legacy-datasets/common_voice - vlsp2020_vinai_100h - AILAB-VNUHCM/vivos - doof-ferb/vlsp2020_vinai_100h - doof-ferb/fpt_fosd - doof-ferb/infore1_25hours - linhtran92/viet_bud500 - doof-ferb/LSVSC - doof-ferb/vais1000 - doof-ferb/VietMed_labeled - NhutP/VSV-1100 - doof-ferb/Speech-MASSIVE_vie - doof-ferb/BibleMMS_vie - capleaf/viVoice metrics: - wer pipeline_tag: automatic-speech-recognition tags: - transcription - audio - speech - chunkformer - asr - automatic-speech-recognition license: cc-by-nc-4.0 model-index: - name: ChunkFormer Large Vietnamese results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: common-voice-vietnamese type: common_voice args: vi metrics: - name: Test WER type: wer value: 6.66 source: name: Common Voice Vi Leaderboard url: https://paperswithcode.com/sota/speech-recognition-on-common-voice-vi - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: VIVOS type: vivos args: vi metrics: - name: Test WER type: wer value: 4.18 source: name: Vivos Leaderboard url: https://paperswithcode.com/sota/speech-recognition-on-vivos - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: VLSP - Task 1 type: vlsp args: vi metrics: - name: Test WER type: wer value: 14.09 --- # **ChunkFormer-Large-Vie: Large-Scale Pretrained ChunkFormer for Vietnamese Automatic Speech Recognition** [![Ranked #1: Speech Recognition on Common Voice Vi](https://img.shields.io/badge/Ranked%20%231%3A%20Speech%20Recognition%20on%20Common%20Voice%20Vi-%F0%9F%8F%86%20SOTA-blueviolet?style=for-the-badge&logo=paperswithcode&logoColor=white)](https://paperswithcode.com/sota/speech-recognition-on-common-voice-vi) [![Ranked #1: Speech Recognition on VIVOS](https://img.shields.io/badge/Ranked%20%231%3A%20Speech%20Recognition%20on%20VIVOS-%F0%9F%8F%86%20SOTA-blueviolet?style=for-the-badge&logo=paperswithcode&logoColor=white)](https://paperswithcode.com/sota/speech-recognition-on-vivos) [![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/) [![GitHub](https://img.shields.io/badge/GitHub-ChunkFormer-blue)](https://github.com/khanld/chunkformer) [![Paper](https://img.shields.io/badge/Paper-ICASSP%202025-green)](https://arxiv.org/abs/2502.14673) [![Model size](https://img.shields.io/badge/Params-110M-lightgrey#model-badge)](#description) --- ## Citation If you use this work in your research, please cite: ```bibtex @INPROCEEDINGS{10888640, author={Le, Khanh and Ho, Tuan Vu and Tran, Dung and Chau, Duc Thanh}, booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, title={ChunkFormer: Masked Chunking Conformer For Long-Form Speech Transcription}, year={2025}, volume={}, number={}, pages={1-5}, keywords={Scalability;Memory management;Graphics processing units;Signal processing;Performance gain;Hardware;Resource management;Speech processing;Standards;Context modeling;chunkformer;masked batch;long-form transcription}, doi={10.1109/ICASSP49660.2025.10888640}} } ``` --- ## Contact - khanhld218@gmail.com - [![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/khanld) - [![LinkedIn](https://img.shields.io/badge/linkedin-%230077B5.svg?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/khanhld257/)