Update README.md
Browse files
README.md
CHANGED
@@ -118,16 +118,17 @@ distil whisper is English ASR only).
|
|
118 |
|
119 |
### Inference Speed
|
120 |
Although the cascaded approach is better in translation task, due to the nature of cascaded approach, the pipeline
|
121 |
-
has additional complexity compared to the single end2end models for the sake of high accuracy.
|
122 |
-
Following table shows the mean inference time on a single RTX 4090 (VRAM 24 GB) in second averaged over 10 trials on audio sample with different durations.
|
|
|
123 |
|
124 |
| model | Param. (M) | 10 | 30 | 60 | 300 |
|
125 |
|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------:|------:|------:|------:|------:|
|
126 |
-
| [**kotoba-tech/kotoba-whisper-bilingual-v1.0**](https://huggingface.co/kotoba-tech/kotoba-whisper-bilingual-v1.0) |
|
127 |
-
| [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-3.3B](https://huggingface.co/facebook/nllb-200-3.3B)) |
|
128 |
-
| [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-1.3B](https://huggingface.co/facebook/nllb-200-1.3B)) |
|
129 |
-
| [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B)) |
|
130 |
-
| [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M)) |
|
131 |
|
132 |
## Transformers Usage
|
133 |
Kotoba-Whisper is supported in the Hugging Face 🤗 Transformers library from version 4.39 onwards. To run the model, first
|
|
|
118 |
|
119 |
### Inference Speed
|
120 |
Although the cascaded approach is better in translation task, due to the nature of cascaded approach, the pipeline
|
121 |
+
has additional complexity and memory consumption compared to the single end2end models for the sake of high accuracy.
|
122 |
+
Following table shows the mean inference time on a single RTX 4090 (VRAM 24 GB) in second averaged over 10 trials on audio sample with different durations, along with the parameter size.
|
123 |
+
|
124 |
|
125 |
| model | Param. (M) | 10 | 30 | 60 | 300 |
|
126 |
|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------:|------:|------:|------:|------:|
|
127 |
+
| [**kotoba-tech/kotoba-whisper-bilingual-v1.0**](https://huggingface.co/kotoba-tech/kotoba-whisper-bilingual-v1.0) | 756 | 0.041 | 0.111 | 0.214 | 1.077 |
|
128 |
+
| [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-3.3B](https://huggingface.co/facebook/nllb-200-3.3B)) | 4056 | 0.173 | 0.247 | 0.352 | 1.772 |
|
129 |
+
| [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-1.3B](https://huggingface.co/facebook/nllb-200-1.3B)) | 2056 | 0.173 | 0.24 | 0.348 | 1.515 |
|
130 |
+
| [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B)) | 2056 | 0.17 | 0.245 | 0.348 | 1.882 |
|
131 |
+
| [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M)) | 1256 | 0.108 | 0.179 | 0.283 | 1.33 |
|
132 |
|
133 |
## Transformers Usage
|
134 |
Kotoba-Whisper is supported in the Hugging Face 🤗 Transformers library from version 4.39 onwards. To run the model, first
|