File size: 418 Bytes
5fa1a76 |
1 2 3 4 5 6 7 8 9 10 11 12 |
A composition of the following transformations are applied on the pretraining tasks for the encoder: mask random tokens (like in BERT) delete random tokens mask a span of k tokens with a single mask token (a span of 0 tokens is an insertion of a mask token) permute sentences rotate the document to make it start at a specific token Implementation Notes Bart doesn't use token_type_ids for sequence classification. |