To finetune a model in TensorFlow, start by setting up an optimizer function, learning rate schedule, and some training hyperparameters: from transformers import create_optimizer, AdamWeightDecay optimizer = AdamWeightDecay(learning_rate=2e-5, weight_decay_rate=0.01) Then you can load DistilRoBERTa with [TFAutoModelForMaskedLM]: from transformers import TFAutoModelForMaskedLM model = TFAutoModelForMaskedLM.from_pretrained("distilbert/distilroberta-base") Convert your datasets to the tf.data.Dataset format with [~transformers.TFPreTrainedModel.prepare_tf_dataset]: tf_train_set = model.prepare_tf_dataset( lm_dataset["train"], shuffle=True, batch_size=16, collate_fn=data_collator, ) tf_test_set = model.prepare_tf_dataset( lm_dataset["test"], shuffle=False, batch_size=16, collate_fn=data_collator, ) Configure the model for training with compile.