File size: 303 Bytes
5fa1a76
 
 
1
2
3
For masked language modeling, ([BertForMaskedLM]), the model expects a tensor of dimension (batch_size,
  seq_length) with each value corresponding to the expected label of each individual token: the labels being the token
  ID for the masked token, and values to be ignored for the rest (usually -100).