attention_mask = torch.tensor([[1, 1, 1, 1, 1, 1], [1, 0, 0, 0, 0, 0]]) output = model(input_ids, attention_mask=attention_mask) print(output.logits) tensor([[ 0.0082, -0.2307], [-0.1008, -0.4061]], grad_fn=) 🤗 Transformers doesn't automatically create an attention_mask to mask a padding token if it is provided because: Some models don't have a padding token.