Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Training the model in float16 is not recommended and is known to produce nan; as such, the model should be trained in bfloat16.