Remove any columns you don't need:

tokenized_eli5 = eli5.map(
     preprocess_function,
     batched=True,
     num_proc=4,
     remove_columns=eli5["train"].column_names,
 )

This dataset contains the token sequences, but some of these are longer than the maximum input length for the model.