Tips: | |
Weights for the Llama2 models can be obtained by filling out this form | |
The architecture is very similar to the first Llama, with the addition of Grouped Query Attention (GQA) following this paper | |
Setting config.pretraining_tp to a value different than 1 will activate the more accurate but slower computation of the linear layers, which should better match the original logits. |