Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
For an input of size [batch_size, sequence_length], the memory required to store the intermediate feed forward
embeddings [batch_size, sequence_length, config.intermediate_size] can account for a large fraction of the memory
use.