|
|
|
DeBERTa |
|
Overview |
|
The DeBERTa model was proposed in DeBERTa: Decoding-enhanced BERT with Disentangled Attention by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen It is based on Google's |
|
BERT model released in 2018 and Facebook's RoBERTa model released in 2019. |
|
It builds on RoBERTa with disentangled attention and enhanced mask decoder training with half of the data used in |
|
RoBERTa. |
|
The abstract from the paper is the following: |
|
Recent progress in pre-trained neural language models has significantly improved the performance of many natural |
|
language processing (NLP) tasks. In this paper we propose a new model architecture DeBERTa (Decoding-enhanced BERT with |
|
disentangled attention) that improves the BERT and RoBERTa models using two novel techniques. The first is the |
|
disentangled attention mechanism, where each word is represented using two vectors that encode its content and |
|
position, respectively, and the attention weights among words are computed using disentangled matrices on their |
|
contents and relative positions. Second, an enhanced mask decoder is used to replace the output softmax layer to |
|
predict the masked tokens for model pretraining. We show that these two techniques significantly improve the efficiency |
|
of model pretraining and performance of downstream tasks. Compared to RoBERTa-Large, a DeBERTa model trained on half of |
|
the training data performs consistently better on a wide range of NLP tasks, achieving improvements on MNLI by +0.9% |
|
(90.2% vs. 91.1%), on SQuAD v2.0 by +2.3% (88.4% vs. 90.7%) and RACE by +3.6% (83.2% vs. 86.8%). The DeBERTa code and |
|
pre-trained models will be made publicly available at https://github.com/microsoft/DeBERTa. |
|
This model was contributed by DeBERTa. This model TF 2.0 implementation was |
|
contributed by kamalkraj . The original code can be found here. |
|
Resources |
|
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with DeBERTa. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource. |
|
|
|
A blog post on how to Accelerate Large Model Training using DeepSpeed with DeBERTa. |
|
A blog post on Supercharged Customer Service with Machine Learning with DeBERTa. |
|
[DebertaForSequenceClassification] is supported by this example script and notebook. |
|
[TFDebertaForSequenceClassification] is supported by this example script and notebook. |
|
Text classification task guide |
|
|
|
[DebertaForTokenClassification] is supported by this example script and notebook. |
|
[TFDebertaForTokenClassification] is supported by this example script and notebook. |
|
Token classification chapter of the 🤗 Hugging Face Course. |
|
Byte-Pair Encoding tokenization chapter of the 🤗 Hugging Face Course. |
|
Token classification task guide |
|
|
|
[DebertaForMaskedLM] is supported by this example script and notebook. |
|
[TFDebertaForMaskedLM] is supported by this example script and notebook. |
|
Masked language modeling chapter of the 🤗 Hugging Face Course. |
|
Masked language modeling task guide |
|
|
|
[DebertaForQuestionAnswering] is supported by this example script and notebook. |
|
[TFDebertaForQuestionAnswering] is supported by this example script and notebook. |
|
Question answering chapter of the 🤗 Hugging Face Course. |
|
Question answering task guide |
|
|
|
DebertaConfig |
|
[[autodoc]] DebertaConfig |
|
DebertaTokenizer |
|
[[autodoc]] DebertaTokenizer |
|
- build_inputs_with_special_tokens |
|
- get_special_tokens_mask |
|
- create_token_type_ids_from_sequences |
|
- save_vocabulary |
|
DebertaTokenizerFast |
|
[[autodoc]] DebertaTokenizerFast |
|
- build_inputs_with_special_tokens |
|
- create_token_type_ids_from_sequences |
|
|
|
DebertaModel |
|
[[autodoc]] DebertaModel |
|
- forward |
|
DebertaPreTrainedModel |
|
[[autodoc]] DebertaPreTrainedModel |
|
DebertaForMaskedLM |
|
[[autodoc]] DebertaForMaskedLM |
|
- forward |
|
DebertaForSequenceClassification |
|
[[autodoc]] DebertaForSequenceClassification |
|
- forward |
|
DebertaForTokenClassification |
|
[[autodoc]] DebertaForTokenClassification |
|
- forward |
|
DebertaForQuestionAnswering |
|
[[autodoc]] DebertaForQuestionAnswering |
|
- forward |
|
|
|
TFDebertaModel |
|
[[autodoc]] TFDebertaModel |
|
- call |
|
TFDebertaPreTrainedModel |
|
[[autodoc]] TFDebertaPreTrainedModel |
|
- call |
|
TFDebertaForMaskedLM |
|
[[autodoc]] TFDebertaForMaskedLM |
|
- call |
|
TFDebertaForSequenceClassification |
|
[[autodoc]] TFDebertaForSequenceClassification |
|
- call |
|
TFDebertaForTokenClassification |
|
[[autodoc]] TFDebertaForTokenClassification |
|
- call |
|
TFDebertaForQuestionAnswering |
|
[[autodoc]] TFDebertaForQuestionAnswering |
|
- call |
|
|
|
|