Automatic Speech Recognition
Transformers
Safetensors
English
Japanese
whisper
audio
hf-asr-leaderboard
asahi417 commited on
Commit
09aae83
·
verified ·
1 Parent(s): 4779456

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -7
README.md CHANGED
@@ -118,16 +118,17 @@ distil whisper is English ASR only).
118
 
119
  ### Inference Speed
120
  Although the cascaded approach is better in translation task, due to the nature of cascaded approach, the pipeline
121
- has additional complexity compared to the single end2end models for the sake of high accuracy.
122
- Following table shows the mean inference time on a single RTX 4090 (VRAM 24 GB) in second averaged over 10 trials on audio sample with different durations.
 
123
 
124
  | model | Param. (M) | 10 | 30 | 60 | 300 |
125
  |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------:|------:|------:|------:|------:|
126
- | [**kotoba-tech/kotoba-whisper-bilingual-v1.0**](https://huggingface.co/kotoba-tech/kotoba-whisper-bilingual-v1.0) | | 0.041 | 0.111 | 0.214 | 1.077 |
127
- | [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-3.3B](https://huggingface.co/facebook/nllb-200-3.3B)) | | 0.173 | 0.247 | 0.352 | 1.772 |
128
- | [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-1.3B](https://huggingface.co/facebook/nllb-200-1.3B)) | | 0.173 | 0.24 | 0.348 | 1.515 |
129
- | [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B)) | | 0.17 | 0.245 | 0.348 | 1.882 |
130
- | [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M)) | | 0.108 | 0.179 | 0.283 | 1.33 |
131
 
132
  ## Transformers Usage
133
  Kotoba-Whisper is supported in the Hugging Face 🤗 Transformers library from version 4.39 onwards. To run the model, first
 
118
 
119
  ### Inference Speed
120
  Although the cascaded approach is better in translation task, due to the nature of cascaded approach, the pipeline
121
+ has additional complexity and memory consumption compared to the single end2end models for the sake of high accuracy.
122
+ Following table shows the mean inference time on a single RTX 4090 (VRAM 24 GB) in second averaged over 10 trials on audio sample with different durations, along with the parameter size.
123
+
124
 
125
  | model | Param. (M) | 10 | 30 | 60 | 300 |
126
  |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------:|------:|------:|------:|------:|
127
+ | [**kotoba-tech/kotoba-whisper-bilingual-v1.0**](https://huggingface.co/kotoba-tech/kotoba-whisper-bilingual-v1.0) | 756 | 0.041 | 0.111 | 0.214 | 1.077 |
128
+ | [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-3.3B](https://huggingface.co/facebook/nllb-200-3.3B)) | 4056 | 0.173 | 0.247 | 0.352 | 1.772 |
129
+ | [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-1.3B](https://huggingface.co/facebook/nllb-200-1.3B)) | 2056 | 0.173 | 0.24 | 0.348 | 1.515 |
130
+ | [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B)) | 2056 | 0.17 | 0.245 | 0.348 | 1.882 |
131
+ | [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M)) | 1256 | 0.108 | 0.179 | 0.283 | 1.33 |
132
 
133
  ## Transformers Usage
134
  Kotoba-Whisper is supported in the Hugging Face 🤗 Transformers library from version 4.39 onwards. To run the model, first