Spaces:

Ahmadzei
/

RAG

Runtime error

RAG

File size: 306 Bytes

5fa1a76

Using Local self attention, the memory and time complexity of the query-key matmul operation can be reduced from
\(\mathcal{O}(n_s \times n_s)\) to \(\mathcal{O}(n_s \times \log(n_s))\), which usually represents the memory
and time bottleneck in a transformer model, with \(n_s\) being the sequence length.