5fa1a76
1
A common belief is their attention-based token mixer module contributes most to their competence.