whisper large v3 finetuning using our own dataset

#189

by rifasca - opened Apr 24

Apr 24

I encountered issues while fine-tuning the Whisper-large-v3 model on a 100-hour Arabic dataset using the LoRA-PEFT approach. The resulting transcriptions were highly inaccurate, with excessive hallucinations and frequent duplication of characters.

khangnguyen2

May 30

Hello, I think you're using LoRA and only fine-tuning q_linear and v_linear. You could try fine-tuning all linear layers instead. Also, I believe the Whisper-large-v3 tokenizer performs poorly for low-resource languages.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment