The cross-entropy loss is calculated between the logits and the label, which is just the token shifted to the right.