LLaMA-2-7B-32K / modeling_flash_llama.py

Commit History

Fix RuntimeError: pad attn scores back to original query sequence length, instead of unpadded sequence length (i.e. no change).
e6c58da

Birchlabs commited on

Correct the output dtype of rmsnorm_func (#13)
aef6d89

juewang ag0 commited on

remove torch.jit
4ec6edc

juewang commited on

init
cf6ad2b

juewang commited on