Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Enabling the scale_attn_by_inverse_layer_idx and reorder_and_upcast_attn flags will apply the training stability
improvements from Mistral (for PyTorch only).