The only preprocessing you have to do is to take the argmax of our predicted logits: | |
import evaluate | |
metric = evaluate.load("accuracy") | |
def compute_metrics(eval_pred): | |
predictions = np.argmax(eval_pred.predictions, axis=1) | |
return metric.compute(predictions=predictions, references=eval_pred.label_ids) | |
A note on evaluation: | |
In the VideoMAE paper, the authors use the following evaluation strategy. |