Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its
complexity from O(L^2) to O(Llog(L)), where L is the length of the sequence.