5fa1a76
1
This is important because the model has to predict the masked tokens, and it teaches the model to predict the number of missing tokens.