codewithdark
/

WhisperLiveSubs

Automatic Speech Recognition

Generated from Trainer

Model card Files Files and versions

Metrics Training metrics Community

codewithdark commited on Sep 4, 2024

Commit

d6c54e1

·

verified ·

1 Parent(s): b638487

Create README.md

Files changed (1) hide show

README.md +80 -0

README.md ADDED Viewed

	@@ -0,0 +1,80 @@

+# Model Card for WhisperLiveSubs
+This model is a fine-tuned version of OpenAI's Whisper model on the Common Voice dataset for Urdu speech recognition. It is optimized for transcribing Urdu language audio.
+### Model Description
+This model is a small variant of the Whisper model fine-tuned on the Common Voice dataset for the Urdu language. It is intended for automatic speech recognition (ASR) tasks and performs well in transcribing Urdu speech.
+- **Developed by:** codewithdark
+- **Model type:** Whisper-based model for ASR
+- **Language(s) (NLP):** Urdu (ur)
+- **License:** Apache 2.0
+- **Finetuned from model :** openai/whisper-small
+## Uses
+### Direct Use
+This model can be used directly for transcribing Urdu audio into text. It is suitable for applications such as:
+- Voice-to-text transcription services
+- Captioning Urdu language videos
+- Speech analytics in Urdu
+### Out-of-Scope Use
+The model may not perform well for:
+- Non-Urdu languages
+- Extremely noisy environments
+- Very long audio sequences without segmentation
+## How to Get Started with the Model
+Use the code below to get started with the model.
+```python
+from transformers import WhisperProcessor, WhisperForConditionalGeneration
+processor = WhisperProcessor.from_pretrained("codewithdark/WhisperLiveSubs")
+model = WhisperForConditionalGeneration.from_pretrained("codewithdark/WhisperLiveSubs")
+# Your transcription code here
+```
+### Training Data
+The model was fine-tuned on the Mozilla Common Voice dataset, specifically the Urdu subset. The dataset consists of approximately [number of hours] of transcribed Urdu speech.
+#### Preprocessing
+The audio was resampled to 16kHz, and text was tokenized using the Whisper tokenizer configured for Urdu.
+#### Training Hyperparameters
+- **Training regime:** Mixed precision (fp16)
+- **Batch size:** 8
+- **Gradient accumulation steps:** 2
+- **Learning rate:** 1e-5
+- **Max steps:** 4000
+#### Metrics
+Word Error Rate (WER) was the primary metric used to evaluate the model's performance.
+### Results
+- **Training Loss:** 0.2005
+- **Validation Loss:** 0.5342
+- **WER:** 51.06
+*This is my first time fine-tuning this model. Don't worry about the current performance;
+improvements can be made to enhance the model's accuracy and reduce the WER.*
+- **Hardware Type:** P100 GPU
+- **Hours used:** 10 hr
+- **Cloud Provider:** Kaggle
+- **Compute Region:** PK
+### Model Architecture and Objective
+The Whisper-UR-Small model is based on the Whisper architecture, designed for automatic speech recognition.
+#### Software
+- **Framework:** PyTorch
+- **Transformers Version:**
+#### Summary
+The model demonstrates acceptable performance for Urdu transcription, but there is room for improvement in terms of WER, especially in noisy conditions or with diverse accents.
+## Model Card Contact
+For inquiries, please contact codewithdark90@gmail.com
+@Codewithdark. (2024). WhisperLiveSubs: An Urdu Automatic Speech Recognition Model. Retrieved from https://huggingface.co/codewithdark/WhisperLiveSubs