Using the GPT-2 model we | |
achieve SOTA results on the WikiText103 (10.8 compared to SOTA perplexity of 15.8) and LAMBADA (66.5% compared to SOTA | |
accuracy of 63.2%) datasets. |
Using the GPT-2 model we | |
achieve SOTA results on the WikiText103 (10.8 compared to SOTA perplexity of 15.8) and LAMBADA (66.5% compared to SOTA | |
accuracy of 63.2%) datasets. |