You can check whether you require the uroman package for your language by inspecting the is_uroman attribute of the pre-trained tokenizer: thon from transformers import VitsTokenizer tokenizer = VitsTokenizer.from_pretrained("facebook/mms-tts-eng") print(tokenizer.is_uroman) If required, you should apply the uroman package to your text inputs prior to passing them to the VitsTokenizer, since currently the tokenizer does not support performing the pre-processing itself.