generated_output = model.generate(**tokens, use_cache=True, max_new_tokens=10) tokenizer.batch_decode(generated_output)[0] 'If I were an AI that had just achieved a breakthrough in machine learning, I would be thrilled' Combining Phi and Flash Attention 2 First, make sure to install the latest version of Flash Attention 2 to include the sliding window attention feature.