Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
If num_key_value_heads=num_attention_heads, the model will use Multi Head Attention (MHA), if num_key_value_heads=1 the model will use Multi Query Attention (MQA), otherwise GQA is used.