Since the original | |
target sequence length may be odd, the data collator makes sure to round the maximum length of the batch down to be a | |
multiple of 2. | |
data_collator = TTSDataCollatorWithPadding(processor=processor) | |
Train the model | |
Load the pre-trained model from the same checkpoint as you used for loading the processor: | |
from transformers import SpeechT5ForTextToSpeech | |
model = SpeechT5ForTextToSpeech.from_pretrained(checkpoint) | |
The use_cache=True option is incompatible with gradient checkpointing. |