Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Instead of designing complicated token mixer to achieve SOTA performance, the target of this work is to demonstrate the competence of transformer models largely stem from the general architecture MetaFormer.