|
|
|
OpenAI GPT2 |
|
|
|
Overview |
|
OpenAI GPT-2 model was proposed in Language Models are Unsupervised Multitask Learners by Alec |
|
Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever from OpenAI. It's a causal (unidirectional) |
|
transformer pretrained using language modeling on a very large corpus of ~40 GB of text data. |
|
The abstract from the paper is the following: |
|
GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset[1] of 8 million |
|
web pages. GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some |
|
text. The diversity of the dataset causes this simple goal to contain naturally occurring demonstrations of many tasks |
|
across diverse domains. GPT-2 is a direct scale-up of GPT, with more than 10X the parameters and trained on more than |
|
10X the amount of data. |
|
Write With Transformer is a webapp created and hosted by |
|
Hugging Face showcasing the generative capabilities of several models. GPT-2 is one of them and is available in five |
|
different sizes: small, medium, large, xl and a distilled version of the small checkpoint: distilgpt-2. |
|
This model was contributed by thomwolf. The original code can be found here. |
|
Usage tips |
|
|
|
GPT-2 is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than |
|
the left. |
|
GPT-2 was trained with a causal language modeling (CLM) objective and is therefore powerful at predicting the next |
|
token in a sequence. Leveraging this feature allows GPT-2 to generate syntactically coherent text as it can be |
|
observed in the run_generation.py example script. |
|
The model can take the past_key_values (for PyTorch) or past (for TF) as input, which is the previously computed |
|
key/value attention pairs. Using this (past_key_values or past) value prevents the model from re-computing |
|
pre-computed values in the context of text generation. For PyTorch, see past_key_values argument of the |
|
[GPT2Model.forward] method, or for TF the past argument of the |
|
[TFGPT2Model.call] method for more information on its usage. |
|
Enabling the scale_attn_by_inverse_layer_idx and reorder_and_upcast_attn flags will apply the training stability |
|
improvements from Mistral (for PyTorch only). |
|
|
|
Resources |
|
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with GPT2. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource. |
|
|
|
A blog on how to Finetune a non-English GPT-2 Model with Hugging Face. |
|
A blog on How to generate text: using different decoding methods for language generation with Transformers with GPT-2. |
|
A blog on Training CodeParrot 🦜 from Scratch, a large GPT-2 model. |
|
A blog on Faster Text Generation with TensorFlow and XLA with GPT-2. |
|
A blog on How to train a Language Model with Megatron-LM with a GPT-2 model. |
|
A notebook on how to finetune GPT2 to generate lyrics in the style of your favorite artist. 🌎 |
|
A notebook on how to finetune GPT2 to generate tweets in the style of your favorite Twitter user. 🌎 |
|
Causal language modeling chapter of the 🤗 Hugging Face Course. |
|
[GPT2LMHeadModel] is supported by this causal language modeling example script, text generation example script, and notebook. |
|
[TFGPT2LMHeadModel] is supported by this causal language modeling example script and notebook. |
|
[FlaxGPT2LMHeadModel] is supported by this causal language modeling example script and notebook. |
|
Text classification task guide |
|
Token classification task guide |
|
Causal language modeling task guide |
|
|
|
GPT2Config |
|
[[autodoc]] GPT2Config |
|
GPT2Tokenizer |
|
[[autodoc]] GPT2Tokenizer |
|
- save_vocabulary |
|
GPT2TokenizerFast |
|
[[autodoc]] GPT2TokenizerFast |
|
GPT2 specific outputs |
|
[[autodoc]] models.gpt2.modeling_gpt2.GPT2DoubleHeadsModelOutput |
|
[[autodoc]] models.gpt2.modeling_tf_gpt2.TFGPT2DoubleHeadsModelOutput |
|
|
|
GPT2Model |
|
[[autodoc]] GPT2Model |
|
- forward |
|
GPT2LMHeadModel |
|
[[autodoc]] GPT2LMHeadModel |
|
- forward |
|
GPT2DoubleHeadsModel |
|
[[autodoc]] GPT2DoubleHeadsModel |
|
- forward |
|
GPT2ForQuestionAnswering |
|
[[autodoc]] GPT2ForQuestionAnswering |
|
- forward |
|
GPT2ForSequenceClassification |
|
[[autodoc]] GPT2ForSequenceClassification |
|
- forward |
|
GPT2ForTokenClassification |
|
[[autodoc]] GPT2ForTokenClassification |
|
- forward |
|
|
|
TFGPT2Model |
|
[[autodoc]] TFGPT2Model |
|
- call |
|
TFGPT2LMHeadModel |
|
[[autodoc]] TFGPT2LMHeadModel |
|
- call |
|
TFGPT2DoubleHeadsModel |
|
[[autodoc]] TFGPT2DoubleHeadsModel |
|
- call |
|
TFGPT2ForSequenceClassification |
|
[[autodoc]] TFGPT2ForSequenceClassification |
|
- call |
|
TFSequenceClassifierOutputWithPast |
|
[[autodoc]] modeling_tf_outputs.TFSequenceClassifierOutputWithPast |
|
TFGPT2Tokenizer |
|
[[autodoc]] TFGPT2Tokenizer |
|
|
|
FlaxGPT2Model |
|
[[autodoc]] FlaxGPT2Model |
|
- call |
|
FlaxGPT2LMHeadModel |
|
[[autodoc]] FlaxGPT2LMHeadModel |
|
- call |
|
|
|
|