Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
The novel convolution heads, together with the
rest self-attention heads, form a new mixed attention block that is more efficient at both global and local context
learning.