Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Finetuning the model in float16 is not recommended and known to produce nan, as such the model should be fine-tuned in bfloat16.