Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
raw
history blame contribute delete
670 Bytes
Likewise, the randomness of the noise is controlled by model.noise_scale:
thon
import torch
from transformers import VitsTokenizer, VitsModel, set_seed
tokenizer = VitsTokenizer.from_pretrained("facebook/mms-tts-eng")
model = VitsModel.from_pretrained("facebook/mms-tts-eng")
inputs = tokenizer(text="Hello - my dog is cute", return_tensors="pt")
make deterministic
set_seed(555)
make speech faster and more noisy
model.speaking_rate = 1.5
model.noise_scale = 0.8
with torch.no_grad():
outputs = model(**inputs)
Language Identification (LID)
Different LID models are available based on the number of languages they can recognize - 126, 256, 512, 1024, 2048, 4017.