Transformer ViT replaces convolutions entirely with a pure Transformer architecture.