To demonstrate that large language models can further advance | |
the state of the art (SOTA), we train an 8.3 billion parameter transformer language model similar to GPT-2 and a 3.9 | |
billion parameter model similar to BERT. |
To demonstrate that large language models can further advance | |
the state of the art (SOTA), we train an 8.3 billion parameter transformer language model similar to GPT-2 and a 3.9 | |
billion parameter model similar to BERT. |