File size: 160 Bytes
5fa1a76
 
1
2
Enabling the scale_attn_by_inverse_layer_idx and reorder_and_upcast_attn flags will apply the training stability
  improvements from Mistral (for PyTorch only).