File size: 301 Bytes
5fa1a76
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
Implications
Transformers models based on the BERT (Bidirectional Encoder Representations from
Transformers)
architecture, or its variants such as
distilBERT and
roBERTa run best on
Inf1 for non-generative tasks such as extractive question answering, sequence
classification, and token classification.