Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
The following code example runs a forward pass using the MMS-TTS English checkpoint:
thon
import torch
from transformers import VitsTokenizer, VitsModel, set_seed
tokenizer = VitsTokenizer.from_pretrained("facebook/mms-tts-eng")
model = VitsModel.from_pretrained("facebook/mms-tts-eng")
inputs = tokenizer(text="Hello - my dog is cute", return_tensors="pt")
set_seed(555) # make deterministic
with torch.no_grad():
outputs = model(**inputs)
waveform = outputs.waveform[0]
The resulting waveform can be saved as a .wav file:
thon
import scipy
scipy.io.wavfile.write("synthesized_speech.wav", rate=model.config.sampling_rate, data=waveform)
Or displayed in a Jupyter Notebook / Google Colab:
thon
from IPython.display import Audio
Audio(waveform, rate=model.config.sampling_rate)
For certain languages with non-Roman alphabets, such as Arabic, Mandarin or Hindi, the uroman
perl package is required to pre-process the text inputs to the Roman alphabet.