The abstract from the paper is the following: Language model pre-training has been shown to capture a surprising amount of world knowledge, crucial for NLP tasks such as question answering.