kotoba-tech
/

kotoba-whisper-bilingual-v1.0

@@ -118,16 +118,17 @@ distil whisper is English ASR only).
 ### Inference Speed
 Although the cascaded approach is better in translation task, due to the nature of cascaded approach, the pipeline
-has additional complexity compared to the single end2end models for the sake of high accuracy.
-Following table shows the mean inference time on a single RTX 4090 (VRAM 24 GB) in second averaged over 10 trials on audio sample with different durations.
 | model                                                                                                                                                                                                     | Param. (M) |    10 |    30 |    60 |   300 |
 |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------:|------:|------:|------:|------:|
-| [**kotoba-tech/kotoba-whisper-bilingual-v1.0**](https://huggingface.co/kotoba-tech/kotoba-whisper-bilingual-v1.0)                                                                                         |            | 0.041 | 0.111 | 0.214 | 1.077 |
-| [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-3.3B](https://huggingface.co/facebook/nllb-200-3.3B))                     |            | 0.173 | 0.247 | 0.352 | 1.772 |
-| [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-1.3B](https://huggingface.co/facebook/nllb-200-1.3B))                     |            | 0.173 | 0.24  | 0.348 | 1.515 |
-| [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B)) |            | 0.17  | 0.245 | 0.348 | 1.882 |
-| [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M)) |            | 0.108 | 0.179 | 0.283 | 1.33  |
 ## Transformers Usage
 Kotoba-Whisper is supported in the Hugging Face 🤗 Transformers library from version 4.39 onwards. To run the model, first

 ### Inference Speed
 Although the cascaded approach is better in translation task, due to the nature of cascaded approach, the pipeline
+has additional complexity and memory consumption compared to the single end2end models for the sake of high accuracy.
+Following table shows the mean inference time on a single RTX 4090 (VRAM 24 GB) in second averaged over 10 trials on audio sample with different durations, along with the parameter size.
 | model                                                                                                                                                                                                     | Param. (M) |    10 |    30 |    60 |   300 |
 |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------:|------:|------:|------:|------:|
+| [**kotoba-tech/kotoba-whisper-bilingual-v1.0**](https://huggingface.co/kotoba-tech/kotoba-whisper-bilingual-v1.0)                                                                                         |        756 | 0.041 | 0.111 | 0.214 | 1.077 |
+| [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-3.3B](https://huggingface.co/facebook/nllb-200-3.3B))                     |       4056 | 0.173 | 0.247 | 0.352 | 1.772 |
+| [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-1.3B](https://huggingface.co/facebook/nllb-200-1.3B))                     |       2056 | 0.173 | 0.24  | 0.348 | 1.515 |
+| [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B)) |       2056 | 0.17  | 0.245 | 0.348 | 1.882 |
+| [japanese-asr/en-cascaded-s2t-translation](https://huggingface.co/japanese-asr/en-cascaded-s2t-translation) ([facebook/nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M)) |       1256 | 0.108 | 0.179 | 0.283 | 1.33  |
 ## Transformers Usage
 Kotoba-Whisper is supported in the Hugging Face 🤗 Transformers library from version 4.39 onwards. To run the model, first