If you trained your own tokenizer, you can create one from your vocabulary file: | |
from transformers import DistilBertTokenizer | |
my_tokenizer = DistilBertTokenizer(vocab_file="my_vocab_file.txt", do_lower_case=False, padding_side="left") | |
It is important to remember the vocabulary from a custom tokenizer will be different from the vocabulary generated by a pretrained model's tokenizer. |