Note: | |
If you want to reproduce the original tokenization process of the OpenAI GPT paper, you will need to install ftfy | |
and SpaCy: | |
pip install spacy ftfy==4.4.3 | |
python -m spacy download en | |
If you don't install ftfy and SpaCy, the [OpenAIGPTTokenizer] will default to tokenize | |
using BERT's BasicTokenizer followed by Byte-Pair Encoding (which should be fine for most usage, don't worry). |