Spaces:

Ahmadzei
/

RAG

Runtime error

RAG

File size: 384 Bytes

5fa1a76

Tips:

Weights for the Llama2 models can be obtained by filling out this form
The architecture is very similar to the first Llama, with the addition of Grouped Query Attention (GQA) following this paper
Setting config.pretraining_tp to a value different than 1 will activate the more accurate but slower computation of the linear layers, which should better match the original logits.