input_ids = tokenizer(prompt, return_tensors="pt").input_ids gen_tokens = model.generate( input_ids, do_sample=True, temperature=0.9, max_length=100, ) gen_text = tokenizer.batch_decode(gen_tokens)[0] Using Flash Attention 2 Flash Attention 2 is an faster, optimized version of the model.