Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
In Flax, one can use the decoder_attention_mask to ignore padded tokens from
the loss (see the Flax summarization script for details).