wannaphong
/

wav2vec2-large-xlsr-53-th-cv8-deepcut

Automatic Speech Recognition

Model card Files Files and versions

wannaphong commited on Jun 20, 2022

Commit

9fd958f

·

1 Parent(s): 261e8ee

Create README.md

Files changed (1) hide show

README.md +32 -0

README.md ADDED Viewed

	@@ -0,0 +1,32 @@

+---
+language:
+- th
+tags:
+- automatic-speech-recognition
+license: apache-2.0
+datasets:
+- common_voice
+metrics:
+- wer
+- cer
+---
+# Thai CommonVoice V8 (deepcut tokenizer)
+This model trained with CommonVoice V8 dataset by increase data from CommonVoice V7 dataset that It was use in [airesearch/wav2vec2-large-xlsr-53-th](https://huggingface.co/airesearch/wav2vec2-large-xlsr-53-th). It was finetune [wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53).
+## Datasets
+It is increase new data from The Common Voice V8 dataset to Common Voice V7 dataset or remove all data in Common Voice V7 dataset before split Common Voice V8 then add CommonVoice V7 dataset back to dataset.
+It use [ekapolc/Thai_commonvoice_split](https://github.com/ekapolc/Thai_commonvoice_split) script for split Common Voice dataset.
+## Models
+This model was finetune [wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) model with Thai Common Voice V8 dataset and It use pre-tokenize with deepcut.tokenize.
+**Links:**
+- GitHub Dataset: [https://github.com/wannaphong/thai_commonvoice_dataset](https://github.com/wannaphong/thai_commonvoice_dataset)
+- Deepcut: [https://github.com/rkcosmos/deepcut](https://github.com/rkcosmos/deepcut)
+[WIP]