Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
As a consequence, TGlobal attention introduces
a few new parameters -- global relative position biases and a layer normalization for global token's embedding.