Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Here's an example using the BERT
tokenizer, which is a WordPiece tokenizer:
thon
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-cased")
sequence = "A Titan RTX has 24GB of VRAM"
The tokenizer takes care of splitting the sequence into tokens available in the tokenizer vocabulary.