File size: 4,467 Bytes

---
library_name: transformers
license: apache-2.0
model-index:
- name: umt5-thai-g2p-9
  results:
  - task:
      type: text2text-generation
      name: Grapheme-to-Phoneme Conversion
    dataset:
      name: B-K/thai-g2p
      type: B-K/thai-g2p
      config: default
      split: sentence_validation
    metrics:
    - type: cer
      value: 0.094
      name: Character Error Rate
    - type: loss
      value: 1.5449
      name: Loss
datasets:
- B-K/thai-g2p
language:
- th
metrics:
- cer
pipeline_tag: text2text-generation
widget:
- text: สวัสดีครับ
  example_title: Thai G2P Example
new_version: B-K/umt5-thai-g2p-v2-0.5k
---

# umt5-thai-g2p

This model is a fine-tuned version of [google/umt5-small](https://huggingface.co/google/umt5-small) on the [B-K/thai-g2p](https://huggingface.co/datasets/B-K/thai-g2p) dataset for Thai Grapheme-to-Phoneme (G2P) conversion.

It achieves the following results on the sentence evaluation set:
- Loss: 1.5449
- CER: 0.094

## Model Description

`umt5-thai-g2p` is designed to convert Thai text (words or sentences) into their corresponding phonemic International Phonetic Alphabet (IPA) representations.

## Intended uses & limitations

### Intended Uses

*   **Thai Grapheme-to-Phoneme (G2P) Conversion**: The primary use of this model is to generate phonemic transcriptions (IPA) for Thai text.
*   **Speech Synthesis Preprocessing**: Can be used as a component in a Text-to-Speech (TTS) pipeline to convert input text into phonemes before acoustic model processing.

### Limitations

*   **Accuracy**: While the model achieves a Character Error Rate (CER) of approximately 0.094 on the evaluation set, it is not 100% accurate. Users should expect some errors in the generated phonemes.
*   **Out-of-Distribution Data**: Performance may degrade on words, phrases, or sentence structures significantly different from those present in the `B-K/thai-g2p` training dataset. This includes very rare words, neologisms, or complex named entities.
*   **Ambiguity**: Thai orthography can sometimes be ambiguous, and the model might not always resolve such ambiguities correctly to the intended pronunciation in all contexts.
*   **Sentence-Level vs. Word-Level**: While trained on a dataset that includes sentences, its robustness for very long or highly complex sentences might vary. The average generated length observed during training was around 27 tokens.
*   **Inherited Limitations**: As a fine-tuned version of `google/umt5-small`, it inherits the general architectural limitations and scale of the base model.

## How to use

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("B-K/umt5-thai-g2p")
model = AutoModelForSeq2SeqLM.from_pretrained("B-K/umt5-thai-g2p")

thai_text = "สวัสดีครับ" # Example Thai text
inputs = tokenizer(thai_text, return_tensors="pt", padding=True, truncation=True)

outputs = model.generate(**inputs, num_beams=3, max_new_tokens=48)
phonemes = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(f"Thai Text: {thai_text}")
print(f"Phonemes: {phonemes}")
```

## Training procedure

### Training Hyperparameters

The following hyperparameters were used during training:
* optimizer: adamw_torch
* learning_rate: (starts with 5e-4 ends with 5e-6)
* lr_scheduler_type: cosine
* num_train_epochs: (about 200? i tune the training settings alot)
* per_device_train_batch_size: 128
* per_device_eval_batch_size: 128
* weight_decay: (starts with 0.01 ends with 0.1)
* label_smoothing_factor: 0.1
* max_grad_norm: 1.0
* warmup_steps: 100
* mixed_precision: bf16

### Training results

| Training Loss | Epoch | Step | Validation Loss | Cer    | Gen Len |
|:-------------:|:-----:|:----:|:---------------:|:------:|:-------:|
| No log        | 1.0   | 134  | 1.5636          | 0.0917 | 27.1747 |
| No log        | 2.0   | 268  | 1.5603          | 0.093  | 27.1781 |
| No log        | 3.0   | 402  | 1.5566          | 0.0938 | 27.1729 |
| 1.1631        | 4.0   | 536  | 1.5524          | 0.0941 | 27.1678 |
| 1.1631        | 5.0   | 670  | 1.5508          | 0.0939 | 27.113  |
| 1.1631        | 6.0   | 804  | 1.5472          | 0.0932 | 27.1575 |
| 1.1631        | 7.0   | 938  | 1.5450          | 0.0933 | 27.1421 |
| 1.1603        | 8.0   | 1072 | 1.5449          | 0.094  | 27.0616 |


### Framework versions

- Transformers 4.47.0
- Pytorch 2.5.1
- Datasets 3.6.0
- Tokenizers 0.21.0