|
|
|
ConvBERT |
|
|
|
Overview |
|
The ConvBERT model was proposed in ConvBERT: Improving BERT with Span-based Dynamic Convolution by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng |
|
Yan. |
|
The abstract from the paper is the following: |
|
Pre-trained language models like BERT and its variants have recently achieved impressive performance in various |
|
natural language understanding tasks. However, BERT heavily relies on the global self-attention block and thus suffers |
|
large memory footprint and computation cost. Although all its attention heads query on the whole input sequence for |
|
generating the attention map from a global perspective, we observe some heads only need to learn local dependencies, |
|
which means the existence of computation redundancy. We therefore propose a novel span-based dynamic convolution to |
|
replace these self-attention heads to directly model local dependencies. The novel convolution heads, together with the |
|
rest self-attention heads, form a new mixed attention block that is more efficient at both global and local context |
|
learning. We equip BERT with this mixed attention design and build a ConvBERT model. Experiments have shown that |
|
ConvBERT significantly outperforms BERT and its variants in various downstream tasks, with lower training cost and |
|
fewer model parameters. Remarkably, ConvBERTbase model achieves 86.4 GLUE score, 0.7 higher than ELECTRAbase, while |
|
using less than 1/4 training cost. Code and pre-trained models will be released. |
|
This model was contributed by abhishek. The original implementation can be found |
|
here: https://github.com/yitu-opensource/ConvBert |
|
Usage tips |
|
ConvBERT training tips are similar to those of BERT. For usage tips refer to BERT documentation. |
|
Resources |
|
|
|
Text classification task guide |
|
Token classification task guide |
|
Question answering task guide |
|
Masked language modeling task guide |
|
Multiple choice task guide |
|
|
|
ConvBertConfig |
|
[[autodoc]] ConvBertConfig |
|
ConvBertTokenizer |
|
[[autodoc]] ConvBertTokenizer |
|
- build_inputs_with_special_tokens |
|
- get_special_tokens_mask |
|
- create_token_type_ids_from_sequences |
|
- save_vocabulary |
|
ConvBertTokenizerFast |
|
[[autodoc]] ConvBertTokenizerFast |
|
|
|
ConvBertModel |
|
[[autodoc]] ConvBertModel |
|
- forward |
|
ConvBertForMaskedLM |
|
[[autodoc]] ConvBertForMaskedLM |
|
- forward |
|
ConvBertForSequenceClassification |
|
[[autodoc]] ConvBertForSequenceClassification |
|
- forward |
|
ConvBertForMultipleChoice |
|
[[autodoc]] ConvBertForMultipleChoice |
|
- forward |
|
ConvBertForTokenClassification |
|
[[autodoc]] ConvBertForTokenClassification |
|
- forward |
|
ConvBertForQuestionAnswering |
|
[[autodoc]] ConvBertForQuestionAnswering |
|
- forward |
|
|
|
TFConvBertModel |
|
[[autodoc]] TFConvBertModel |
|
- call |
|
TFConvBertForMaskedLM |
|
[[autodoc]] TFConvBertForMaskedLM |
|
- call |
|
TFConvBertForSequenceClassification |
|
[[autodoc]] TFConvBertForSequenceClassification |
|
- call |
|
TFConvBertForMultipleChoice |
|
[[autodoc]] TFConvBertForMultipleChoice |
|
- call |
|
TFConvBertForTokenClassification |
|
[[autodoc]] TFConvBertForTokenClassification |
|
- call |
|
TFConvBertForQuestionAnswering |
|
[[autodoc]] TFConvBertForQuestionAnswering |
|
- call |
|
|
|
|