kotoba-tech
/

kotoba-whisper-bilingual-v1.0

Automatic Speech Recognition

hf-asr-leaderboard

Model card Files Files and versions Community

asahi417 commited on Sep 30, 2024

Commit

5b2c01e

·

verified ·

1 Parent(s): 09aae83

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -122,7 +122,7 @@ has additional complexity and memory consumption compared to the single end2end
 Following table shows the mean inference time on a single RTX 4090 (VRAM 24 GB) in second averaged over 10 trials on audio sample with different durations, along with the parameter size.
-| model                                                                                                                                                                                                     | Param. (M) |    10 |    30 |    60 |   300 |
 |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------:|------:|------:|------:|------:|
 | [**kotoba-tech/kotoba-whisper-bilingual-v1.0**](https://huggingface.co/kotoba-tech/kotoba-whisper-bilingual-v1.0)                                                                                         |        756 | 0.041 | 0.111 | 0.214 | 1.077 |
 | [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-3.3B](https://huggingface.co/facebook/nllb-200-3.3B))                     |       4056 | 0.173 | 0.247 | 0.352 | 1.772 |

 Following table shows the mean inference time on a single RTX 4090 (VRAM 24 GB) in second averaged over 10 trials on audio sample with different durations, along with the parameter size.
+| model                                                                                                                                                                                                     | Param. (M) | 10 (sec.) | 30 (sec.) | 60 (sec.) | 300 (sec.) |
 |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------:|------:|------:|------:|------:|
 | [**kotoba-tech/kotoba-whisper-bilingual-v1.0**](https://huggingface.co/kotoba-tech/kotoba-whisper-bilingual-v1.0)                                                                                         |        756 | 0.041 | 0.111 | 0.214 | 1.077 |
 | [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-3.3B](https://huggingface.co/facebook/nllb-200-3.3B))                     |       4056 | 0.173 | 0.247 | 0.352 | 1.772 |