The abstract from the paper is the following: | |
With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves | |
better performance than pretraining approaches based on autoregressive language modeling. |