Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
In those cases, the final hidden states are upsampled to the input sequence length and go through two additional layers.