|
|
|
RemBERT |
|
Overview |
|
The RemBERT model was proposed in Rethinking Embedding Coupling in Pre-trained Language Models by Hyung Won Chung, Thibault Févry, Henry Tsai, Melvin Johnson, Sebastian Ruder. |
|
The abstract from the paper is the following: |
|
We re-evaluate the standard practice of sharing weights between input and output embeddings in state-of-the-art |
|
pre-trained language models. We show that decoupled embeddings provide increased modeling flexibility, allowing us to |
|
significantly improve the efficiency of parameter allocation in the input embedding of multilingual models. By |
|
reallocating the input embedding parameters in the Transformer layers, we achieve dramatically better performance on |
|
standard natural language understanding tasks with the same number of parameters during fine-tuning. We also show that |
|
allocating additional capacity to the output embedding provides benefits to the model that persist through the |
|
fine-tuning stage even though the output embedding is discarded after pre-training. Our analysis shows that larger |
|
output embeddings prevent the model's last layers from overspecializing to the pre-training task and encourage |
|
Transformer representations to be more general and more transferable to other tasks and languages. Harnessing these |
|
findings, we are able to train models that achieve strong performance on the XTREME benchmark without increasing the |
|
number of parameters at the fine-tuning stage. |
|
Usage tips |
|
For fine-tuning, RemBERT can be thought of as a bigger version of mBERT with an ALBERT-like factorization of the |
|
embedding layer. The embeddings are not tied in pre-training, in contrast with BERT, which enables smaller input |
|
embeddings (preserved during fine-tuning) and bigger output embeddings (discarded at fine-tuning). The tokenizer is |
|
also similar to the Albert one rather than the BERT one. |
|
Resources |
|
|
|
Text classification task guide |
|
Token classification task guide |
|
Question answering task guide |
|
Causal language modeling task guide |
|
Masked language modeling task guide |
|
Multiple choice task guide |
|
|
|
RemBertConfig |
|
[[autodoc]] RemBertConfig |
|
RemBertTokenizer |
|
[[autodoc]] RemBertTokenizer |
|
- build_inputs_with_special_tokens |
|
- get_special_tokens_mask |
|
- create_token_type_ids_from_sequences |
|
- save_vocabulary |
|
RemBertTokenizerFast |
|
[[autodoc]] RemBertTokenizerFast |
|
- build_inputs_with_special_tokens |
|
- get_special_tokens_mask |
|
- create_token_type_ids_from_sequences |
|
- save_vocabulary |
|
|
|
RemBertModel |
|
[[autodoc]] RemBertModel |
|
- forward |
|
RemBertForCausalLM |
|
[[autodoc]] RemBertForCausalLM |
|
- forward |
|
RemBertForMaskedLM |
|
[[autodoc]] RemBertForMaskedLM |
|
- forward |
|
RemBertForSequenceClassification |
|
[[autodoc]] RemBertForSequenceClassification |
|
- forward |
|
RemBertForMultipleChoice |
|
[[autodoc]] RemBertForMultipleChoice |
|
- forward |
|
RemBertForTokenClassification |
|
[[autodoc]] RemBertForTokenClassification |
|
- forward |
|
RemBertForQuestionAnswering |
|
[[autodoc]] RemBertForQuestionAnswering |
|
- forward |
|
|
|
TFRemBertModel |
|
[[autodoc]] TFRemBertModel |
|
- call |
|
TFRemBertForMaskedLM |
|
[[autodoc]] TFRemBertForMaskedLM |
|
- call |
|
TFRemBertForCausalLM |
|
[[autodoc]] TFRemBertForCausalLM |
|
- call |
|
TFRemBertForSequenceClassification |
|
[[autodoc]] TFRemBertForSequenceClassification |
|
- call |
|
TFRemBertForMultipleChoice |
|
[[autodoc]] TFRemBertForMultipleChoice |
|
- call |
|
TFRemBertForTokenClassification |
|
[[autodoc]] TFRemBertForTokenClassification |
|
- call |
|
TFRemBertForQuestionAnswering |
|
[[autodoc]] TFRemBertForQuestionAnswering |
|
- call |
|
|
|
|