REALM Overview The REALM model was proposed in REALM: Retrieval-Augmented Language Model Pre-Training by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang. It's a retrieval-augmented language model that firstly retrieves documents from a textual knowledge corpus and then utilizes retrieved documents to process question answering tasks. The abstract from the paper is the following: Language model pre-training has been shown to capture a surprising amount of world knowledge, crucial for NLP tasks such as question answering. However, this knowledge is stored implicitly in the parameters of a neural network, requiring ever-larger networks to cover more facts. To capture knowledge in a more modular and interpretable way, we augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia, used during pre-training, fine-tuning and inference. For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner, using masked language modeling as the learning signal and backpropagating through a retrieval step that considers millions of documents. We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA). We compare against state-of-the-art models for both explicit and implicit knowledge storage on three popular Open-QA benchmarks, and find that we outperform all previous methods by a significant margin (4-16% absolute accuracy), while also providing qualitative benefits such as interpretability and modularity. This model was contributed by qqaatw. The original code can be found here. RealmConfig [[autodoc]] RealmConfig RealmTokenizer [[autodoc]] RealmTokenizer - build_inputs_with_special_tokens - get_special_tokens_mask - create_token_type_ids_from_sequences - save_vocabulary - batch_encode_candidates RealmTokenizerFast [[autodoc]] RealmTokenizerFast - batch_encode_candidates RealmRetriever [[autodoc]] RealmRetriever RealmEmbedder [[autodoc]] RealmEmbedder - forward RealmScorer [[autodoc]] RealmScorer - forward RealmKnowledgeAugEncoder [[autodoc]] RealmKnowledgeAugEncoder - forward RealmReader [[autodoc]] RealmReader - forward RealmForOpenQA [[autodoc]] RealmForOpenQA - block_embedding_to - forward