File size: 322 Bytes
5fa1a76 |
1 2 3 4 5 6 7 |
This will use the inputs as labels shifted to the right by one element: from transformers import DataCollatorForLanguageModeling tokenizer.pad_token = tokenizer.eos_token data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False) Use the end-of-sequence token as the padding token and set mlm=False. |