File size: 310 Bytes
5fa1a76
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
Preprocess

The next step is to load a DistilBERT tokenizer to preprocess the tokens field:

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")

As you saw in the example tokens field above, it looks like the input has already been tokenized.