Update README.md
Browse files
README.md
CHANGED
@@ -28,11 +28,11 @@ During early experiments, we observed that Whisper Tiny often produced invalid o
|
|
28 |
|
29 |
The training loss combined the standard ASR loss with KD loss:
|
30 |
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
|
35 |
-
where
|
36 |
|
37 |
### Hyperparameters
|
38 |
|
|
|
28 |
|
29 |
The training loss combined the standard ASR loss with KD loss:
|
30 |
|
31 |
+
$$
|
32 |
+
L_t = \lambda_{lm} \, \text{CE}(\text{asr}, \text{true token}) + (1 - \lambda_{lm}) \, \text{KLD}(\text{asr distribution}, \text{mlm prediction})
|
33 |
+
$$
|
34 |
|
35 |
+
where $\lambda_{lm}$ balances the two components.
|
36 |
|
37 |
### Hyperparameters
|
38 |
|