This downloads the vocab a model was pretrained with: from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased") Then pass your text to the tokenizer: encoded_input = tokenizer("Do not meddle in the affairs of wizards, for they are subtle and quick to anger.")