Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
The PyTorch models can take the past_key_values as input, which is the previously computed key/value attention pairs.