For this task, load the word error rate (WER) metric (see the 🤗 Evaluate quick tour to learn more about how to load and compute a metric):

import evaluate
wer = evaluate.load("wer")

Then create a function that passes your predictions and labels to [~evaluate.EvaluationModule.compute] to calculate the WER:

import numpy as np
def compute_metrics(pred):
     pred_logits = pred.predictions
     pred_ids = np.argmax(pred_logits, axis=-1)

     pred.label_ids[pred.label_ids == -100] = processor.tokenizer.pad_token_id
     pred_str = processor.batch_decode(pred_ids)
     label_str = processor.batch_decode(pred.label_ids, group_tokens=False)
     wer = wer.compute(predictions=pred_str, references=label_str)
     return {"wer": wer}

Your compute_metrics function is ready to go now, and you'll return to it when you setup your training.