File size: 153 Bytes
5fa1a76
 
1
2
In this case, you can reduce 
the per_device_train_batch_size incrementally by factors of 2 and increase gradient_accumulation_steps by 2x to compensate.