File size: 128 Bytes
5fa1a76
1
Finetuning the model in float16 is not recommended and known to produce nan, as such the model should be fine-tuned in bfloat16.