with torch.no_grad(): speech = vocoder(spectrogram) from IPython.display import Audio Audio(speech.numpy(), rate=16000) In our experience, obtaining satisfactory results from this model can be challenging.