Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
In this work, we present our techniques for training very large transformer models and implement a simple,
efficient intra-layer model parallel approach that enables training transformer models with billions of parameters.