Preprocess | |
For masked language modeling, the next step is to load a DistilRoBERTa tokenizer to process the text subfield: | |
from transformers import AutoTokenizer | |
tokenizer = AutoTokenizer.from_pretrained("distilbert/distilroberta-base") | |
You'll notice from the example above, the text field is actually nested inside answers. |