Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
T5 is pretrained by supervised (GLUE and SuperGLUE) training and self-supervised training (randomly sample and drop out 15% of tokens).