File size: 202 Bytes
5fa1a76
 
 
1
2
3
Here's a TLDR explaining how Perceiver works:
The main problem with the self-attention mechanism of the Transformer is that the time and memory requirements scale
quadratically with the sequence length.