File size: 303 Bytes
5fa1a76 |
1 2 3 |
For masked language modeling, ([BertForMaskedLM]), the model expects a tensor of dimension (batch_size, seq_length) with each value corresponding to the expected label of each individual token: the labels being the token ID for the masked token, and values to be ignored for the rest (usually -100). |