Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,87 @@
|
|
1 |
-
---
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: cs
|
3 |
+
datasets:
|
4 |
+
- CommonVoice
|
5 |
+
- CC100
|
6 |
+
tags:
|
7 |
+
- automatic-speech-recognition
|
8 |
+
- whisper
|
9 |
+
- knowledge-distillation
|
10 |
+
- czech
|
11 |
+
license: mit
|
12 |
+
---
|
13 |
+
|
14 |
+
# Whisper Tiny Czech (Knowledge Distillation from MLM)
|
15 |
+
|
16 |
+
This model is a fine-tuned version of Whisper Tiny adapted for Czech automatic speech recognition (ASR) using **knowledge distillation (KD)** from a **masked language model (MLM)**.
|
17 |
+
|
18 |
+
## Model Description
|
19 |
+
|
20 |
+
During early experiments, we observed that Whisper Tiny often produced invalid or unpronounceable Czech words even when given ground-truth context. To address this, we trained a Czech MLM to act as a language teacher during Whisper’s fine-tuning.
|
21 |
+
|
22 |
+
- **Teacher Model**: BiLSTM-based masked language model (60M parameters) trained on a 210MB subset of the CC100-Czech dataset.
|
23 |
+
- **Distillation Approach**: At each decoding step, Whisper was trained not only with standard cross-entropy loss on the next token but also encouraged to align its token distribution with that predicted by the MLM (via KL-divergence loss).
|
24 |
+
- **Tokenizer**: Same byte pair encoding (BPE) as Whisper.
|
25 |
+
- **Training Data**: CommonVoice Czech 19.0 dataset for speech; CC100-Czech for language modeling.
|
26 |
+
|
27 |
+
### Loss Function
|
28 |
+
|
29 |
+
The training loss combined the standard ASR loss with KD loss:
|
30 |
+
|
31 |
+
\[
|
32 |
+
L_{t} = \lambda_{lm} \, \text{CE}(\text{asr}, \text{true token}) + (1 - \lambda_{lm}) \, \text{KLD}(\text{asr distribution}, \text{mlm prediction})
|
33 |
+
\]
|
34 |
+
|
35 |
+
where \(\lambda_{lm}\) balances the two components.
|
36 |
+
|
37 |
+
### Hyperparameters
|
38 |
+
|
39 |
+
| Model | Learning Rate | KD Lambda | Batch Size |
|
40 |
+
|--------------------|---------------|-----------|------------|
|
41 |
+
| Tiny Baseline | 5e-4 | - | 8 |
|
42 |
+
| Tiny Adapted (KD) | 1e-4 | 1e-3 | 8 |
|
43 |
+
|
44 |
+
The learning rates are not matching because they were optimised for each case separately.
|
45 |
+
|
46 |
+
### Results on CommonVoice Czech
|
47 |
+
|
48 |
+
| Model | Validation Loss | WER | CER |
|
49 |
+
|--------------------|------------------|------|------|
|
50 |
+
| Tiny Baseline | 1.236 | 0.447| 0.031|
|
51 |
+
| Tiny Adapted (KD) | 0.636 | 0.345| 0.023|
|
52 |
+
|
53 |
+
✅ **CER reduced by ~25%**
|
54 |
+
✅ **WER reduced by ~23%**
|
55 |
+
|
56 |
+
This shows that even very light knowledge distillation from a lightweight MLM significantly improves language modelling capabilities in Whisper Tiny for Czech.
|
57 |
+
|
58 |
+
---
|
59 |
+
|
60 |
+
## Intended Use
|
61 |
+
|
62 |
+
This model is ideal for research and applications in Czech ASR where lightweight, efficient models are needed, but a better grasp of the language is crucial.
|
63 |
+
|
64 |
+
## Limitations
|
65 |
+
|
66 |
+
- Trained on a relatively small subset (210MB) of CC100-Czech due to computational constraints.
|
67 |
+
- Optimized for clean, non-code-switched Czech speech (based on CommonVoice data).
|
68 |
+
|
69 |
+
## Acknowledgments
|
70 |
+
- Knowledge distillation ([Hinton et al., 2015](https://arxiv.org/abs/1503.02531))
|
71 |
+
- Whisper model family ([OpenAI, 2022](https://openai.com/research/whisper))
|
72 |
+
- CommonVoice dataset ([Mozilla, 2020](https://commonvoice.mozilla.org))
|
73 |
+
- CC100 dataset ([Conneau et al., 2020](https://arxiv.org/abs/1911.02116))
|
74 |
+
|
75 |
+
## Citation
|
76 |
+
|
77 |
+
If you use this model, please cite (yes, the main topic of the thesis was indeed about assistive ASR):
|
78 |
+
|
79 |
+
```
|
80 |
+
@misc{nadrchal_2025,
|
81 |
+
title={Deep-Learning ASR for a Patient with Permanent Tracheostomy: A Case Study},
|
82 |
+
author={David Nadrchal},
|
83 |
+
year={2025},
|
84 |
+
note={Bachelor's Thesis},
|
85 |
+
url={https://github.com/Hobit2002/TracheoSpeech_ASR}
|
86 |
+
}
|
87 |
+
```
|