Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
In this paper, we propose a novel large kernel attention (LKA) module to enable self-adaptive and long-range correlations in self-attention while avoiding the above issues.