Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
For example, the BERT model
builds its two sequence input as such:
thon
[CLS] SEQUENCE_A [SEP] SEQUENCE_B [SEP]
We can use our tokenizer to automatically generate such a sentence by passing the two sequences to tokenizer as two
arguments (and not a list, like before) like this:
thon
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-cased")
sequence_a = "HuggingFace is based in NYC"
sequence_b = "Where is HuggingFace based?"