Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Note:
If you want to reproduce the original tokenization process of the OpenAI GPT paper, you will need to install ftfy
and SpaCy:
pip install spacy ftfy==4.4.3
python -m spacy download en
If you don't install ftfy and SpaCy, the [OpenAIGPTTokenizer] will default to tokenize
using BERT's BasicTokenizer followed by Byte-Pair Encoding (which should be fine for most usage, don't worry).