Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Also, Transformers have to adopt the sparse versions of point-wise self-attentions for long series efficiency, resulting in the information utilization bottleneck.