For this task, load the word error rate (WER) metric (see the 🤗 Evaluate quick tour to learn more about how to load and compute a metric): | |
import evaluate | |
wer = evaluate.load("wer") | |
Then create a function that passes your predictions and labels to [~evaluate.EvaluationModule.compute] to calculate the WER: | |
import numpy as np | |
def compute_metrics(pred): | |
pred_logits = pred.predictions | |
pred_ids = np.argmax(pred_logits, axis=-1) | |
pred.label_ids[pred.label_ids == -100] = processor.tokenizer.pad_token_id | |
pred_str = processor.batch_decode(pred_ids) | |
label_str = processor.batch_decode(pred.label_ids, group_tokens=False) | |
wer = wer.compute(predictions=pred_str, references=label_str) | |
return {"wer": wer} | |
Your compute_metrics function is ready to go now, and you'll return to it when you setup your training. |