whisper large v3 finetuning using our own dataset

#189
by rifasca - opened

I encountered issues while fine-tuning the Whisper-large-v3 model on a 100-hour Arabic dataset using the LoRA-PEFT approach. The resulting transcriptions were highly inaccurate, with excessive hallucinations and frequent duplication of characters.

Hello, I think you're using LoRA and only fine-tuning q_linear and v_linear. You could try fine-tuning all linear layers instead. Also, I believe the Whisper-large-v3 tokenizer performs poorly for low-resource languages.

Sign up or log in to comment