Before passing your predictions to compute, you need to convert the logits to predictions (remember all 🤗 Transformers models return logits):

def compute_metrics(eval_pred):
     logits, labels = eval_pred
     predictions = np.argmax(logits, axis=-1)
     return metric.compute(predictions=predictions, references=labels)

If you'd like to monitor your evaluation metrics during fine-tuning, specify the evaluation_strategy parameter in your training arguments to report the evaluation metric at the end of each epoch:

from transformers import TrainingArguments, Trainer
training_args = TrainingArguments(output_dir="test_trainer", evaluation_strategy="epoch")

Trainer
Create a [Trainer] object with your model, training arguments, training and test datasets, and evaluation function:

trainer = Trainer(
     model=model,
     args=training_args,
     train_dataset=small_train_dataset,
     eval_dataset=small_eval_dataset,
     compute_metrics=compute_metrics,
 )

Then fine-tune your model by calling [~transformers.Trainer.train]:

trainer.train()

Train a TensorFlow model with Keras
You can also train 🤗 Transformers models in TensorFlow with the Keras API!