Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
To demonstrate that large language models can further advance
the state of the art (SOTA), we train an 8.3 billion parameter transformer language model similar to GPT-2 and a 3.9
billion parameter model similar to BERT.