The PyTorch models can take the past_key_values as input, which is the previously computed key/value attention pairs.