togethercomputer
/

LLaMA-2-7B-32K

Text Generation

text-generation-inference

Model card Files Files and versions

LLaMA-2-7B-32K / modeling_flash_llama.py

Commit History

Fix RuntimeError: pad attn scores back to original query sequence length, instead of unpadded sequence length (i.e. no change).

e6c58da

Birchlabs commited on Aug 14, 2023

Correct the output dtype of rmsnorm_func (#13)

aef6d89

ag0 commited on Aug 5, 2023

remove torch.jit

4ec6edc

juewang commited on Aug 3, 2023

init

cf6ad2b

juewang commited on Jul 26, 2023