Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
It presents two parameter-reduction techniques to lower memory consumption and increase the training
speed of BERT:
Splitting the embedding matrix into two smaller matrices.