--- license: cc-by-nc-4.0 pipeline_tag: automatic-speech-recognition base_model: - openai/whisper-small library_name: transformers language: - ami - trv - bnn - pwn - tay - tsu - tao - dru - xsy - pyu - szy - ckv - sxr - ssf - xnb --- # Model Card for whisper-small-formosan-all This model is a fine-tuned version of the Taiwanese indigenous [openai/whisper-small](https://huggingface.co/openai/whisper-small). Note: we use indonesian as whisper language id ### Training process The training of the model was performed with the following hyperparameters - Batch size: 32*4 (on 4 L40s GPU) - Gradient accumulation steps: 8 - Total steps: 1600 - Learning rate: 1.25e-5 - Data augmentation: No - Optimizer: schedule_free_adamw - LR scheduler type: constant ### How to use ```python import torch from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline device = "cuda:0" if torch.cuda.is_available() else "cpu" torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32 model_id = "formospeech/whisper-small-formosan-all" model = AutoModelForSpeechSeq2Seq.from_pretrained( model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True ) model.to(device) processor = AutoProcessor.from_pretrained(model_id) pipe = pipeline( "automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, max_new_tokens=128, chunk_length_s=30, batch_size=16, torch_dtype=torch_dtype, device=device, ) generate_kwargs = {"language": "id"} transcription = pipe("path/to/my_audio.wav", generate_kwargs=generate_kwargs) print(transcription) ```