Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
The abstract from the paper is the following:
We re-evaluate the standard practice of sharing weights between input and output embeddings in state-of-the-art
pre-trained language models.