Note that T5 uses the pad_token_id as the decoder_start_token_id, so when doing generation without using [~generation.GenerationMixin.generate], make sure you start it with the pad_token_id.