Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
These changes introduce desirable properties of convolutional neural networks (CNNs)
to the ViT architecture (\ie shift, scale, and distortion invariance) while maintaining the merits of Transformers (\ie dynamic attention,
global context, and better generalization).