However, BERT heavily relies on the global self-attention block and thus suffers | |
large memory footprint and computation cost. |
However, BERT heavily relies on the global self-attention block and thus suffers | |
large memory footprint and computation cost. |