These labels are different according to the model head, for example: For sequence classification models, ([BertForSequenceClassification]), the model expects a tensor of dimension (batch_size) with each value of the batch corresponding to the expected label of the entire sequence.