File size: 578 Bytes
5fa1a76
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
Wav2Vec2 for audio classification and automatic speech recognition (ASR)
Vision Transformer (ViT) and ConvNeXT for image classification
DETR for object detection
Mask2Former for image segmentation
GLPN for depth estimation
BERT for NLP tasks like text classification, token classification and question answering that use an encoder
GPT2 for NLP tasks like text generation that use a decoder
BART for NLP tasks like summarization and translation that use an encoder-decoder

Before you go further, it is good to have some basic knowledge of the original Transformer architecture.