File size: 326 Bytes
5fa1a76 |
1 2 3 4 5 6 7 8 |
Preprocess For masked language modeling, the next step is to load a DistilRoBERTa tokenizer to process the text subfield: from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("distilbert/distilroberta-base") You'll notice from the example above, the text field is actually nested inside answers. |