The abstract from the paper is the following: Vision Transformers (ViT) have shown rapid progress in computer vision tasks, achieving promising results on various benchmarks.