A common belief is their attention-based token mixer module contributes most to their competence.