File size: 578 Bytes
5fa1a76 |
1 2 3 4 5 6 7 8 9 10 |
Wav2Vec2 for audio classification and automatic speech recognition (ASR) Vision Transformer (ViT) and ConvNeXT for image classification DETR for object detection Mask2Former for image segmentation GLPN for depth estimation BERT for NLP tasks like text classification, token classification and question answering that use an encoder GPT2 for NLP tasks like text generation that use a decoder BART for NLP tasks like summarization and translation that use an encoder-decoder Before you go further, it is good to have some basic knowledge of the original Transformer architecture. |