File size: 293 Bytes
5fa1a76
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
Remove any columns you don't need:

tokenized_eli5 = eli5.map(
     preprocess_function,
     batched=True,
     num_proc=4,
     remove_columns=eli5["train"].column_names,
 )

This dataset contains the token sequences, but some of these are longer than the maximum input length for the model.