Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Inference doesn't require any large additional amounts of memory for the optimizer states and gradients so you can fit much larger batches and/or sequence lengths on the same hardware.