This is important because the model has to predict the masked tokens, and it teaches the model to predict the number of missing tokens. |
This is important because the model has to predict the masked tokens, and it teaches the model to predict the number of missing tokens. |