Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
The abstract from the paper is the following:
We present in this paper a new architecture, named Convolutional vision Transformer (CvT), that improves Vision Transformer (ViT)
in performance and efficiency by introducing convolutions into ViT to yield the best of both designs.