File size: 186 Bytes
5fa1a76
 
 
1
2
3
The abstract from the paper is the following:
We re-evaluate the standard practice of sharing weights between input and output embeddings in state-of-the-art
pre-trained language models.