Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
To verify this, we deliberately replace the attention module in transformers with an embarrassingly simple spatial pooling operator to conduct only the most basic token mixing.