File size: 326 Bytes
5fa1a76
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
Preprocess

For masked language modeling, the next step is to load a DistilRoBERTa tokenizer to process the text subfield:

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("distilbert/distilroberta-base")

You'll notice from the example above, the text field is actually nested inside answers.