File size: 220 Bytes
5fa1a76
 
 
1
2
3
By
reallocating the input embedding parameters in the Transformer layers, we achieve dramatically better performance on
standard natural language understanding tasks with the same number of parameters during fine-tuning.