Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Tips:
Weights for the Llama2 models can be obtained by filling out this form
The architecture is very similar to the first Llama, with the addition of Grouped Query Attention (GQA) following this paper
Setting config.pretraining_tp to a value different than 1 will activate the more accurate but slower computation of the linear layers, which should better match the original logits.