Enabling the scale_attn_by_inverse_layer_idx and reorder_and_upcast_attn flags will apply the training stability | |
improvements from Mistral (for PyTorch only). |
Enabling the scale_attn_by_inverse_layer_idx and reorder_and_upcast_attn flags will apply the training stability | |
improvements from Mistral (for PyTorch only). |