Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
The final hidden states of the predicted mask tokens are passed to a feedforward network with a softmax over the vocabulary to predict the masked word.