Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
raw
history blame contribute delete
322 Bytes
This will use the inputs as labels shifted to the right by one element:
from transformers import DataCollatorForLanguageModeling
tokenizer.pad_token = tokenizer.eos_token
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
Use the end-of-sequence token as the padding token and set mlm=False.