---
license: cc-by-4.0
datasets:
- amphion/Emilia-Dataset
language:
- ko
base_model:
- ResembleAI/chatterbox
pipeline_tag: text-to-speech
tags:
- audio
- speech
- tts
- fine-tuning
- chatterbox
- Emilia
- voice-cloning
- zero-shot
- korean
---
# Chatterbox TTS Korean ๐ธ
**Chatterbox TTS Korean** is a fine-tuned text-to-speech model specialized for the French language. The model has been trained on high-quality voice data for natural and expressive speech synthesis.
- ๐ **Language**: Korean
- ๐ฃ๏ธ **Training dataset**: [Emilia Dataset (KO branch)](https://huggingface.co/datasets/amphion/Emilia-Dataset)
- โฑ๏ธ **Data quantity**: 200 hours of audio
## Usage Example
Hereโs how to generate speech using Chatterbox-TTS Korean:
```python
import torch
import soundfile as sf
from chatterbox.tts import ChatterboxTTS
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
# Configuration
MODEL_REPO = "Thomcles/Chatterbox-TTS-Korean"
T3_FILENAME = "t3_cfg.safetensors"
TOKENIZER_FILENAME = "tokenizer_en_ko.json"
OUTPUT_PATH = "output_cloned_voice.wav"
TEXT_TO_SYNTHESIZE = "๋ก๋ง๋ ํ๋ฃจ์์นจ์ ์ด๋ฃจ์ด์ง ๊ฒ์ด ์๋๋ค"
def get_device() -> str:
return "cuda" if torch.cuda.is_available() else "cpu"
def download_checkpoint(repo: str, filename: str) -> str:
return hf_hub_download(repo_id=repo, filename=filename)
def load_tts_model(repo: str, checkpoint_file: str, TOKENIZER_FILENAME:str, device: str) -> ChatterboxTTS:
model = ChatterboxTTS.from_pretrained(device=device)
checkpoint_path = download_checkpoint(repo, checkpoint_file)
t3_state = load_file(checkpoint_path, device="cpu")
model.t3.load_state_dict(t3_state)
model.tokenizer = EnTokenizer(TOKENIZER_FILENAME)
model.t3.text_emb = nn.Embedding(4715+1, model.t3.dim)
model.t3.text_head = nn.Linear(model.t3.cfg.hidden_size, 4715+1, bias=False)
return model
def synthesize_speech(model: ChatterboxTTS, text: str, audio_prompt_path:str, **kwargs) -> torch.Tensor:
with torch.inference_mode():
return model.generate(
text=text,
audio_prompt_path=audio_prompt_path,
**kwargs
)
def save_audio(waveform: torch.Tensor, path: str, sample_rate: int):
sf.write(path, waveform.squeeze().cpu().numpy(), sample_rate)
def main():
print("Loading model...")
device = get_device()
model = load_tts_model(MODEL_REPO, CHECKPOINT_FILENAME, device)
print(f"Generating speech on {device}...")
wav = synthesize_speech(
model,
TEXT_TO_SYNTHESIZE,
audio_prompt_path=None
exaggeration=0.5,
temperature=0.6,
cfg_weight=0.3
)
print(f"Saving output to: {OUTPUT_PATH}")
save_audio(wav, OUTPUT_PATH, model.sr)
print("Done.")
if __name__ == "__main__":
main()
```
Here is the output:
### Base model license
The base model is licensed under the MIT License.
Base model: [Chatterbox](https://huggingface.co/ResembleAI/chatterbox)
License: [MIT](https://choosealicense.com/licenses/mit/)
### Training Data License
This model was fine-tuned using a dataset licensed under Creative Commons Attribution 4.0 (CC BY 4.0).
Dataset: [Emilia](https://huggingface.co/datasets/amphion/Emilia-Dataset)
License: [Creative Commons Attribution 4.0 International](https://choosealicense.com/licenses/cc-by-4.0/)
### Contact me
Interested in fine-tuning a TTS model in a specific language or building a multilingual voice solution? Donโt hesitate to reach out.