File size: 306 Bytes
5fa1a76 |
1 2 3 |
Using Local self attention, the memory and time complexity of the query-key matmul operation can be reduced from \(\mathcal{O}(n_s \times n_s)\) to \(\mathcal{O}(n_s \times \log(n_s))\), which usually represents the memory and time bottleneck in a transformer model, with \(n_s\) being the sequence length. |