Spaces:

Ahmadzei
/

RAG

Runtime error

RAG

File size: 187 Bytes

5fa1a76

ViT hybrid is a slight variant of the plain Vision Transformer,
by leveraging a convolutional backbone (specifically, BiT) whose features are used as initial "tokens" for the Transformer.