It obtains new state-of-the-art results on eleven natural | |
language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI | |
accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute | |
improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement). |