Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Note that T5 uses the pad_token_id as the decoder_start_token_id, so when doing generation without using
[~generation.GenerationMixin.generate], make sure you start it with the pad_token_id.