The following code example runs a forward pass using the MMS-TTS English checkpoint:

thon
import torch
from transformers import VitsTokenizer, VitsModel, set_seed
tokenizer = VitsTokenizer.from_pretrained("facebook/mms-tts-eng")
model = VitsModel.from_pretrained("facebook/mms-tts-eng")
inputs = tokenizer(text="Hello - my dog is cute", return_tensors="pt")
set_seed(555)  # make deterministic
with torch.no_grad():
   outputs = model(**inputs)
waveform = outputs.waveform[0]

The resulting waveform can be saved as a .wav file:
thon
import scipy
scipy.io.wavfile.write("synthesized_speech.wav", rate=model.config.sampling_rate, data=waveform)

Or displayed in a Jupyter Notebook / Google Colab:
thon
from IPython.display import Audio
Audio(waveform, rate=model.config.sampling_rate)

For certain languages with non-Roman alphabets, such as Arabic, Mandarin or Hindi, the uroman 
perl package is required to pre-process the text inputs to the Roman alphabet.