generated_output = model.generate(**tokens, use_cache=True, max_new_tokens=10) | |
tokenizer.batch_decode(generated_output)[0] | |
'If I were an AI that had just achieved a breakthrough in machine learning, I would be thrilled' | |
Combining Phi and Flash Attention 2 | |
First, make sure to install the latest version of Flash Attention 2 to include the sliding window attention feature. |