Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
remove_idx = [590, 821, 822, 875, 876, 878, 879]
keep = [i for i in range(len(cppe5["train"])) if i not in remove_idx]
cppe5["train"] = cppe5["train"].select(keep)
Preprocess the data
To finetune a model, you must preprocess the data you plan to use to match precisely the approach used for the pre-trained model.