transformers also follows this convention for consistency with PyTorch.