Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
ALBERT reduces memory consumption by lowering the number of parameters in two ways: separating the larger vocabulary embedding into two smaller matrices and allowing layers to share parameters.