|
|
|
RoBERTa |
|
|
|
Overview |
|
The RoBERTa model was proposed in RoBERTa: A Robustly Optimized BERT Pretraining Approach by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer |
|
Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. It is based on Google's BERT model released in 2018. |
|
It builds on BERT and modifies key hyperparameters, removing the next-sentence pretraining objective and training with |
|
much larger mini-batches and learning rates. |
|
The abstract from the paper is the following: |
|
Language model pretraining has led to significant performance gains but careful comparison between different |
|
approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, |
|
and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication |
|
study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and |
|
training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of every |
|
model published after it. Our best model achieves state-of-the-art results on GLUE, RACE and SQuAD. These results |
|
highlight the importance of previously overlooked design choices, and raise questions about the source of recently |
|
reported improvements. We release our models and code. |
|
This model was contributed by julien-c. The original code can be found here. |
|
Usage tips |
|
|
|
This implementation is the same as [BertModel] with a tiny embeddings tweak as well as a setup |
|
for Roberta pretrained models. |
|
RoBERTa has the same architecture as BERT, but uses a byte-level BPE as a tokenizer (same as GPT-2) and uses a |
|
different pretraining scheme. |
|
RoBERTa doesn't have token_type_ids, you don't need to indicate which token belongs to which segment. Just |
|
separate your segments with the separation token tokenizer.sep_token (or </s>) |
|
|
|
Same as BERT with better pretraining tricks: |
|
|
|
dynamic masking: tokens are masked differently at each epoch, whereas BERT does it once and for all |
|
together to reach 512 tokens (so the sentences are in an order than may span several documents) |
|
train with larger batches |
|
use BPE with bytes as a subunit and not characters (because of unicode characters) |
|
CamemBERT is a wrapper around RoBERTa. Refer to this page for usage examples. |
|
|
|
Resources |
|
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with RoBERTa. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource. |
|
|
|
A blog on Getting Started with Sentiment Analysis on Twitter using RoBERTa and the Inference API. |
|
A blog on Opinion Classification with Kili and Hugging Face AutoTrain using RoBERTa. |
|
A notebook on how to finetune RoBERTa for sentiment analysis. 🌎 |
|
[RobertaForSequenceClassification] is supported by this example script and notebook. |
|
[TFRobertaForSequenceClassification] is supported by this example script and notebook. |
|
[FlaxRobertaForSequenceClassification] is supported by this example script and notebook. |
|
Text classification task guide |
|
|
|
[RobertaForTokenClassification] is supported by this example script and notebook. |
|
[TFRobertaForTokenClassification] is supported by this example script and notebook. |
|
[FlaxRobertaForTokenClassification] is supported by this example script. |
|
Token classification chapter of the 🤗 Hugging Face Course. |
|
Token classification task guide |
|
|
|
A blog on How to train a new language model from scratch using Transformers and Tokenizers with RoBERTa. |
|
[RobertaForMaskedLM] is supported by this example script and notebook. |
|
[TFRobertaForMaskedLM] is supported by this example script and notebook. |
|
[FlaxRobertaForMaskedLM] is supported by this example script and notebook. |
|
Masked language modeling chapter of the 🤗 Hugging Face Course. |
|
Masked language modeling task guide |
|
|
|
A blog on Accelerated Inference with Optimum and Transformers Pipelines with RoBERTa for question answering. |
|
[RobertaForQuestionAnswering] is supported by this example script and notebook. |
|
[TFRobertaForQuestionAnswering] is supported by this example script and notebook. |
|
[FlaxRobertaForQuestionAnswering] is supported by this example script. |
|
Question answering chapter of the 🤗 Hugging Face Course. |
|
Question answering task guide |
|
|
|
Multiple choice |
|
- [RobertaForMultipleChoice] is supported by this example script and notebook. |
|
- [TFRobertaForMultipleChoice] is supported by this example script and notebook. |
|
- Multiple choice task guide |
|
RobertaConfig |
|
[[autodoc]] RobertaConfig |
|
RobertaTokenizer |
|
[[autodoc]] RobertaTokenizer |
|
- build_inputs_with_special_tokens |
|
- get_special_tokens_mask |
|
- create_token_type_ids_from_sequences |
|
- save_vocabulary |
|
RobertaTokenizerFast |
|
[[autodoc]] RobertaTokenizerFast |
|
- build_inputs_with_special_tokens |
|
|
|
RobertaModel |
|
[[autodoc]] RobertaModel |
|
- forward |
|
RobertaForCausalLM |
|
[[autodoc]] RobertaForCausalLM |
|
- forward |
|
RobertaForMaskedLM |
|
[[autodoc]] RobertaForMaskedLM |
|
- forward |
|
RobertaForSequenceClassification |
|
[[autodoc]] RobertaForSequenceClassification |
|
- forward |
|
RobertaForMultipleChoice |
|
[[autodoc]] RobertaForMultipleChoice |
|
- forward |
|
RobertaForTokenClassification |
|
[[autodoc]] RobertaForTokenClassification |
|
- forward |
|
RobertaForQuestionAnswering |
|
[[autodoc]] RobertaForQuestionAnswering |
|
- forward |
|
|
|
TFRobertaModel |
|
[[autodoc]] TFRobertaModel |
|
- call |
|
TFRobertaForCausalLM |
|
[[autodoc]] TFRobertaForCausalLM |
|
- call |
|
TFRobertaForMaskedLM |
|
[[autodoc]] TFRobertaForMaskedLM |
|
- call |
|
TFRobertaForSequenceClassification |
|
[[autodoc]] TFRobertaForSequenceClassification |
|
- call |
|
TFRobertaForMultipleChoice |
|
[[autodoc]] TFRobertaForMultipleChoice |
|
- call |
|
TFRobertaForTokenClassification |
|
[[autodoc]] TFRobertaForTokenClassification |
|
- call |
|
TFRobertaForQuestionAnswering |
|
[[autodoc]] TFRobertaForQuestionAnswering |
|
- call |
|
|
|
FlaxRobertaModel |
|
[[autodoc]] FlaxRobertaModel |
|
- call |
|
FlaxRobertaForCausalLM |
|
[[autodoc]] FlaxRobertaForCausalLM |
|
- call |
|
FlaxRobertaForMaskedLM |
|
[[autodoc]] FlaxRobertaForMaskedLM |
|
- call |
|
FlaxRobertaForSequenceClassification |
|
[[autodoc]] FlaxRobertaForSequenceClassification |
|
- call |
|
FlaxRobertaForMultipleChoice |
|
[[autodoc]] FlaxRobertaForMultipleChoice |
|
- call |
|
FlaxRobertaForTokenClassification |
|
[[autodoc]] FlaxRobertaForTokenClassification |
|
- call |
|
FlaxRobertaForQuestionAnswering |
|
[[autodoc]] FlaxRobertaForQuestionAnswering |
|
- call |
|
|
|
|