Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Like other classification heads, the MLP head converts the output into logits over the class labels and calculates the cross-entropy loss to find the most likely class.