--- library_name: transformers tags: [] --- # Speech-to-text model for Uzbek ### Model Description The Whisper model was fine tuned with LORA (Low-Rank Adaption) to reduce time consumption and efficent use of resource (GPU/CPU). - Base model: [whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) (over 1.5B million parameters) - LORA fine-tuned model: [whisper-large-lora-uz](https://huggingface.co/ShakhzoDavronov/whisper-large-lora-uz) (around 15 million paramters) ### Datasets The popular dataset [common voice version 13.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_13_0/viewer/uz) was feed into model. ### Testing Model With code provided below user can test model performance: ```python import torch from transformers import AutomaticSpeechRecognitionPipeline from transformers import WhisperTokenizer,WhisperForConditionalGeneration,WhisperProcessor from peft import PeftModel, PeftConfig stt_model_id = "ShakhzoDavronov/whisper-large-lora-uz" language = "Uzbek" task = "transcribe" stt_config = PeftConfig.from_pretrained(stt_model_id) stt_model = WhisperForConditionalGeneration.from_pretrained( stt_config.base_model_name_or_path, load_in_8bit=True, device_map="auto" ) stt_model = PeftModel.from_pretrained(stt_model, stt_model_id) stt_tokenizer = WhisperTokenizer.from_pretrained(stt_config.base_model_name_or_path, language=language, task=task) stt_processor = WhisperProcessor.from_pretrained(stt_config.base_model_name_or_path, language=language, task=task) stt_feature_extractor = stt_processor.feature_extractor forced_decoder_ids = stt_processor.get_decoder_prompt_ids(language=language, task=task) stt_pipe = AutomaticSpeechRecognitionPipeline(model=stt_model, tokenizer=stt_tokenizer, feature_extractor=stt_feature_extractor) def transcribe(audio): with torch.cuda.amp.autocast(): text = stt_pipe(audio, generate_kwargs={"forced_decoder_ids": forced_decoder_ids}, max_new_tokens=255)["text"] return text ``` ```python extracted_text=transcribe(test_audio) ner_labels=ner_pipe(extracted_text) for ner in ner_labels: print(ner) ``` Results: ```python Soon ``` ### Training Metrics * WER: ~46.0 * Normalized WER: ~33.0