It builds on BERT and modifies key hyperparameters, removing the next-sentence pretraining objective and training with | |
much larger mini-batches and learning rates. |
It builds on BERT and modifies key hyperparameters, removing the next-sentence pretraining objective and training with | |
much larger mini-batches and learning rates. |