In this case, you can reduce | |
the per_device_train_batch_size incrementally by factors of 2 and increase gradient_accumulation_steps by 2x to compensate. |
In this case, you can reduce | |
the per_device_train_batch_size incrementally by factors of 2 and increase gradient_accumulation_steps by 2x to compensate. |