File size: 384 Bytes
5fa1a76 |
1 2 3 4 5 |
Tips: Weights for the Llama2 models can be obtained by filling out this form The architecture is very similar to the first Llama, with the addition of Grouped Query Attention (GQA) following this paper Setting config.pretraining_tp to a value different than 1 will activate the more accurate but slower computation of the linear layers, which should better match the original logits. |