File size: 134 Bytes
5fa1a76
 
1
2
In Flax, one can use the decoder_attention_mask to ignore padded tokens from
the loss (see the Flax summarization script for details).