|
|
|
I-BERT |
|
Overview |
|
The I-BERT model was proposed in I-BERT: Integer-only BERT Quantization by |
|
Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney and Kurt Keutzer. It's a quantized version of RoBERTa running |
|
inference up to four times faster. |
|
The abstract from the paper is the following: |
|
Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results in many Natural Language |
|
Processing tasks. However, their memory footprint, inference latency, and power consumption are prohibitive for |
|
efficient inference at the edge, and even at the data center. While quantization can be a viable solution for this, |
|
previous work on quantizing Transformer based models use floating-point arithmetic during inference, which cannot |
|
efficiently utilize integer-only logical units such as the recent Turing Tensor Cores, or traditional integer-only ARM |
|
processors. In this work, we propose I-BERT, a novel quantization scheme for Transformer based models that quantizes |
|
the entire inference with integer-only arithmetic. Based on lightweight integer-only approximation methods for |
|
nonlinear operations, e.g., GELU, Softmax, and Layer Normalization, I-BERT performs an end-to-end integer-only BERT |
|
inference without any floating point calculation. We evaluate our approach on GLUE downstream tasks using |
|
RoBERTa-Base/Large. We show that for both cases, I-BERT achieves similar (and slightly higher) accuracy as compared to |
|
the full-precision baseline. Furthermore, our preliminary implementation of I-BERT shows a speedup of 2.4 - 4.0x for |
|
INT8 inference on a T4 GPU system as compared to FP32 inference. The framework has been developed in PyTorch and has |
|
been open-sourced. |
|
This model was contributed by kssteven. The original code can be found here. |
|
Resources |
|
|
|
Text classification task guide |
|
Token classification task guide |
|
Question answering task guide |
|
Masked language modeling task guide |
|
Multiple choice task guide |
|
|
|
IBertConfig |
|
[[autodoc]] IBertConfig |
|
IBertModel |
|
[[autodoc]] IBertModel |
|
- forward |
|
IBertForMaskedLM |
|
[[autodoc]] IBertForMaskedLM |
|
- forward |
|
IBertForSequenceClassification |
|
[[autodoc]] IBertForSequenceClassification |
|
- forward |
|
IBertForMultipleChoice |
|
[[autodoc]] IBertForMultipleChoice |
|
- forward |
|
IBertForTokenClassification |
|
[[autodoc]] IBertForTokenClassification |
|
- forward |
|
IBertForQuestionAnswering |
|
[[autodoc]] IBertForQuestionAnswering |
|
- forward |