For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its | |
complexity from O(L^2) to O(Llog(L)), where L is the length of the sequence. |
For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its | |
complexity from O(L^2) to O(Llog(L)), where L is the length of the sequence. |