Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
In this case, you can reduce
the per_device_train_batch_size incrementally by factors of 2 and increase gradient_accumulation_steps by 2x to compensate.