File size: 387 Bytes
5fa1a76 |
1 2 3 4 5 6 7 8 |
Note: If you want to reproduce the original tokenization process of the OpenAI GPT paper, you will need to install ftfy and SpaCy: pip install spacy ftfy==4.4.3 python -m spacy download en If you don't install ftfy and SpaCy, the [OpenAIGPTTokenizer] will default to tokenize using BERT's BasicTokenizer followed by Byte-Pair Encoding (which should be fine for most usage, don't worry). |