Remove any columns you don't need: tokenized_squad = squad.map(preprocess_function, batched=True, remove_columns=squad["train"].column_names) Now create a batch of examples using [DefaultDataCollator].