Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
gen_tokens = model.generate(
input_ids,
do_sample=True,
temperature=0.9,
max_length=100,
)
gen_text = tokenizer.batch_decode(gen_tokens)[0]
Using Flash Attention 2
Flash Attention 2 is an faster, optimized version of the model.