Likewise, the randomness of the noise is controlled by model.noise_scale: thon import torch from transformers import VitsTokenizer, VitsModel, set_seed tokenizer = VitsTokenizer.from_pretrained("facebook/mms-tts-eng") model = VitsModel.from_pretrained("facebook/mms-tts-eng") inputs = tokenizer(text="Hello - my dog is cute", return_tensors="pt") make deterministic set_seed(555) make speech faster and more noisy model.speaking_rate = 1.5 model.noise_scale = 0.8 with torch.no_grad(): outputs = model(**inputs) Language Identification (LID) Different LID models are available based on the number of languages they can recognize - 126, 256, 512, 1024, 2048, 4017.