Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
This is important because the model has to predict the masked tokens, and it teaches the model to predict the number of missing tokens.