Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
This is done to support randomly initializing this layer at
fine-tuning, as it is shown to yield better results for some cases in the paper.