Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
The authors show that the combination of both is useful for training with large batch sizes, and has a significant
impact on transfer learning.