File size: 141 Bytes
5fa1a76
 
1
2
feed forward chunking
In each residual attention block in transformers the self-attention layer is usually followed by 2 feed forward layers.