Each block consists of attention and Mix-FFN layers.