Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
The intermediate embedding size of the feed forward layers is often bigger than the hidden size of the model (e.g., for
google-bert/bert-base-uncased).