Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Here's a TLDR explaining how Perceiver works:
The main problem with the self-attention mechanism of the Transformer is that the time and memory requirements scale
quadratically with the sequence length.