Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
If you trained your own tokenizer, you can create one from your vocabulary file:
from transformers import DistilBertTokenizer
my_tokenizer = DistilBertTokenizer(vocab_file="my_vocab_file.txt", do_lower_case=False, padding_side="left")
It is important to remember the vocabulary from a custom tokenizer will be different from the vocabulary generated by a pretrained model's tokenizer.