Hybrid Vision Transformer (ViT Hybrid) | |
Overview | |
The hybrid Vision Transformer (ViT) model was proposed in An Image is Worth 16x16 Words: Transformers for Image Recognition | |
at Scale by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk | |
Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob | |
Uszkoreit, Neil Houlsby. |