codewithdark commited on
Commit
d6c54e1
·
verified ·
1 Parent(s): b638487

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -0
README.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Model Card for WhisperLiveSubs
2
+ This model is a fine-tuned version of OpenAI's Whisper model on the Common Voice dataset for Urdu speech recognition. It is optimized for transcribing Urdu language audio.
3
+
4
+ ### Model Description
5
+ This model is a small variant of the Whisper model fine-tuned on the Common Voice dataset for the Urdu language. It is intended for automatic speech recognition (ASR) tasks and performs well in transcribing Urdu speech.
6
+ - **Developed by:** codewithdark
7
+ - **Model type:** Whisper-based model for ASR
8
+ - **Language(s) (NLP):** Urdu (ur)
9
+ - **License:** Apache 2.0
10
+ - **Finetuned from model :** openai/whisper-small
11
+
12
+ ## Uses
13
+ ### Direct Use
14
+ This model can be used directly for transcribing Urdu audio into text. It is suitable for applications such as:
15
+ - Voice-to-text transcription services
16
+ - Captioning Urdu language videos
17
+ - Speech analytics in Urdu
18
+
19
+ ### Out-of-Scope Use
20
+ The model may not perform well for:
21
+ - Non-Urdu languages
22
+ - Extremely noisy environments
23
+ - Very long audio sequences without segmentation
24
+
25
+ ## How to Get Started with the Model
26
+ Use the code below to get started with the model.
27
+
28
+ ```python
29
+ from transformers import WhisperProcessor, WhisperForConditionalGeneration
30
+
31
+ processor = WhisperProcessor.from_pretrained("codewithdark/WhisperLiveSubs")
32
+ model = WhisperForConditionalGeneration.from_pretrained("codewithdark/WhisperLiveSubs")
33
+
34
+ # Your transcription code here
35
+ ```
36
+
37
+ ### Training Data
38
+ The model was fine-tuned on the Mozilla Common Voice dataset, specifically the Urdu subset. The dataset consists of approximately [number of hours] of transcribed Urdu speech.
39
+
40
+ #### Preprocessing
41
+ The audio was resampled to 16kHz, and text was tokenized using the Whisper tokenizer configured for Urdu.
42
+
43
+ #### Training Hyperparameters
44
+ - **Training regime:** Mixed precision (fp16)
45
+ - **Batch size:** 8
46
+ - **Gradient accumulation steps:** 2
47
+ - **Learning rate:** 1e-5
48
+ - **Max steps:** 4000
49
+
50
+ #### Metrics
51
+ Word Error Rate (WER) was the primary metric used to evaluate the model's performance.
52
+
53
+ ### Results
54
+
55
+ - **Training Loss:** 0.2005
56
+ - **Validation Loss:** 0.5342
57
+ - **WER:** 51.06
58
+
59
+ *This is my first time fine-tuning this model. Don't worry about the current performance;
60
+ improvements can be made to enhance the model's accuracy and reduce the WER.*
61
+
62
+ - **Hardware Type:** P100 GPU
63
+ - **Hours used:** 10 hr
64
+ - **Cloud Provider:** Kaggle
65
+ - **Compute Region:** PK
66
+
67
+ ### Model Architecture and Objective
68
+ The Whisper-UR-Small model is based on the Whisper architecture, designed for automatic speech recognition.
69
+
70
+ #### Software
71
+ - **Framework:** PyTorch
72
+ - **Transformers Version:**
73
+
74
+ #### Summary
75
+ The model demonstrates acceptable performance for Urdu transcription, but there is room for improvement in terms of WER, especially in noisy conditions or with diverse accents.
76
+
77
+ ## Model Card Contact
78
+ For inquiries, please contact codewithdark90@gmail.com
79
+
80
+ @Codewithdark. (2024). WhisperLiveSubs: An Urdu Automatic Speech Recognition Model. Retrieved from https://huggingface.co/codewithdark/WhisperLiveSubs