|
|
|
OpenAI GPT |
|
|
|
Overview |
|
OpenAI GPT model was proposed in Improving Language Understanding by Generative Pre-Training |
|
by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. It's a causal (unidirectional) transformer |
|
pre-trained using language modeling on a large corpus will long range dependencies, the Toronto Book Corpus. |
|
The abstract from the paper is the following: |
|
Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering, |
|
semantic similarity assessment, and document classification. Although large unlabeled text corpora are abundant, |
|
labeled data for learning these specific tasks is scarce, making it challenging for discriminatively trained models to |
|
perform adequately. We demonstrate that large gains on these tasks can be realized by generative pretraining of a |
|
language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. In |
|
contrast to previous approaches, we make use of task-aware input transformations during fine-tuning to achieve |
|
effective transfer while requiring minimal changes to the model architecture. We demonstrate the effectiveness of our |
|
approach on a wide range of benchmarks for natural language understanding. Our general task-agnostic model outperforms |
|
discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon |
|
the state of the art in 9 out of the 12 tasks studied. |
|
Write With Transformer is a webapp created and hosted by Hugging Face |
|
showcasing the generative capabilities of several models. GPT is one of them. |
|
This model was contributed by thomwolf. The original code can be found here. |
|
Usage tips |
|
|
|
GPT is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than |
|
the left. |
|
GPT was trained with a causal language modeling (CLM) objective and is therefore powerful at predicting the next |
|
token in a sequence. Leveraging this feature allows GPT-2 to generate syntactically coherent text as it can be |
|
observed in the run_generation.py example script. |
|
|
|
Note: |
|
If you want to reproduce the original tokenization process of the OpenAI GPT paper, you will need to install ftfy |
|
and SpaCy: |
|
|
|
pip install spacy ftfy==4.4.3 |
|
python -m spacy download en |
|
If you don't install ftfy and SpaCy, the [OpenAIGPTTokenizer] will default to tokenize |
|
using BERT's BasicTokenizer followed by Byte-Pair Encoding (which should be fine for most usage, don't worry). |
|
Resources |
|
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with OpenAI GPT. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource. |
|
|
|
A blog post on outperforming OpenAI GPT-3 with SetFit for text-classification. |
|
See also: Text classification task guide |
|
|
|
A blog on how to Finetune a non-English GPT-2 Model with Hugging Face. |
|
A blog on How to generate text: using different decoding methods for language generation with Transformers with GPT-2. |
|
A blog on Training CodeParrot 🦜 from Scratch, a large GPT-2 model. |
|
A blog on Faster Text Generation with TensorFlow and XLA with GPT-2. |
|
A blog on How to train a Language Model with Megatron-LM with a GPT-2 model. |
|
A notebook on how to finetune GPT2 to generate lyrics in the style of your favorite artist. 🌎 |
|
A notebook on how to finetune GPT2 to generate tweets in the style of your favorite Twitter user. 🌎 |
|
Causal language modeling chapter of the 🤗 Hugging Face Course. |
|
[OpenAIGPTLMHeadModel] is supported by this causal language modeling example script, text generation example script and notebook. |
|
[TFOpenAIGPTLMHeadModel] is supported by this causal language modeling example script and notebook. |
|
See also: Causal language modeling task guide |
|
|
|
A course material on Byte-Pair Encoding tokenization. |
|
|
|
OpenAIGPTConfig |
|
[[autodoc]] OpenAIGPTConfig |
|
OpenAIGPTTokenizer |
|
[[autodoc]] OpenAIGPTTokenizer |
|
- save_vocabulary |
|
OpenAIGPTTokenizerFast |
|
[[autodoc]] OpenAIGPTTokenizerFast |
|
OpenAI specific outputs |
|
[[autodoc]] models.openai.modeling_openai.OpenAIGPTDoubleHeadsModelOutput |
|
[[autodoc]] models.openai.modeling_tf_openai.TFOpenAIGPTDoubleHeadsModelOutput |
|
|
|
OpenAIGPTModel |
|
[[autodoc]] OpenAIGPTModel |
|
- forward |
|
OpenAIGPTLMHeadModel |
|
[[autodoc]] OpenAIGPTLMHeadModel |
|
- forward |
|
OpenAIGPTDoubleHeadsModel |
|
[[autodoc]] OpenAIGPTDoubleHeadsModel |
|
- forward |
|
OpenAIGPTForSequenceClassification |
|
[[autodoc]] OpenAIGPTForSequenceClassification |
|
- forward |
|
|
|
TFOpenAIGPTModel |
|
[[autodoc]] TFOpenAIGPTModel |
|
- call |
|
TFOpenAIGPTLMHeadModel |
|
[[autodoc]] TFOpenAIGPTLMHeadModel |
|
- call |
|
TFOpenAIGPTDoubleHeadsModel |
|
[[autodoc]] TFOpenAIGPTDoubleHeadsModel |
|
- call |
|
TFOpenAIGPTForSequenceClassification |
|
[[autodoc]] TFOpenAIGPTForSequenceClassification |
|
- call |
|
|
|
|