File size: 387 Bytes
5fa1a76
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
Note:
If you want to reproduce the original tokenization process of the OpenAI GPT paper, you will need to install ftfy
and SpaCy:

pip install spacy ftfy==4.4.3
python -m spacy download en
If you don't install ftfy and SpaCy, the [OpenAIGPTTokenizer] will default to tokenize
using BERT's BasicTokenizer followed by Byte-Pair Encoding (which should be fine for most usage, don't worry).