|
|
|
GPT-J |
|
Overview |
|
The GPT-J model was released in the kingoflolz/mesh-transformer-jax repository by Ben Wang and Aran Komatsuzaki. It is a GPT-2-like |
|
causal language model trained on the Pile dataset. |
|
This model was contributed by Stella Biderman. |
|
Usage tips |
|
|
|
To load GPT-J in float32 one would need at least 2x model size |
|
RAM: 1x for initial weights and another 1x to load the checkpoint. So for GPT-J it would take at least 48GB |
|
RAM to just load the model. To reduce the RAM usage there are a few options. The torch_dtype argument can be |
|
used to initialize the model in half-precision on a CUDA device only. There is also a fp16 branch which stores the fp16 weights, |
|
which could be used to further minimize the RAM usage: |
|
|
|
thon |
|
|
|
from transformers import GPTJForCausalLM |
|
import torch |
|
device = "cuda" |
|
model = GPTJForCausalLM.from_pretrained( |
|
"EleutherAI/gpt-j-6B", |
|
revision="float16", |
|
torch_dtype=torch.float16, |
|
).to(device) |
|
|
|
The model should fit on 16GB GPU for inference. For training/fine-tuning it would take much more GPU RAM. Adam |
|
optimizer for example makes four copies of the model: model, gradients, average and squared average of the gradients. |
|
So it would need at least 4x model size GPU memory, even with mixed precision as gradient updates are in fp32. This |
|
is not including the activations and data batches, which would again require some more GPU RAM. So one should explore |
|
solutions such as DeepSpeed, to train/fine-tune the model. Another option is to use the original codebase to |
|
train/fine-tune the model on TPU and then convert the model to Transformers format for inference. Instructions for |
|
that could be found here |
|
|
|
Although the embedding matrix has a size of 50400, only 50257 entries are used by the GPT-2 tokenizer. These extra |
|
tokens are added for the sake of efficiency on TPUs. To avoid the mismatch between embedding matrix size and vocab |
|
size, the tokenizer for GPT-J contains 143 extra tokens |
|
<|extratoken_1|> <|extratoken_143|>, so the vocab_size of tokenizer also becomes 50400. |
|
|
|
Usage examples |
|
The [~generation.GenerationMixin.generate] method can be used to generate text using GPT-J |
|
model. |
|
thon |
|
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-j-6B") |
|
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B") |
|
prompt = ( |
|
"In a shocking finding, scientists discovered a herd of unicorns living in a remote, " |
|
"previously unexplored valley, in the Andes Mountains. Even more surprising to the " |
|
"researchers was the fact that the unicorns spoke perfect English." |
|
) |
|
input_ids = tokenizer(prompt, return_tensors="pt").input_ids |
|
gen_tokens = model.generate( |
|
input_ids, |
|
do_sample=True, |
|
temperature=0.9, |
|
max_length=100, |
|
) |
|
gen_text = tokenizer.batch_decode(gen_tokens)[0] |
|
|
|
or in float16 precision: |
|
thon |
|
|
|
from transformers import GPTJForCausalLM, AutoTokenizer |
|
import torch |
|
device = "cuda" |
|
model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", torch_dtype=torch.float16).to(device) |
|
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B") |
|
prompt = ( |
|
"In a shocking finding, scientists discovered a herd of unicorns living in a remote, " |
|
"previously unexplored valley, in the Andes Mountains. Even more surprising to the " |
|
"researchers was the fact that the unicorns spoke perfect English." |
|
) |
|
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device) |
|
gen_tokens = model.generate( |
|
input_ids, |
|
do_sample=True, |
|
temperature=0.9, |
|
max_length=100, |
|
) |
|
gen_text = tokenizer.batch_decode(gen_tokens)[0] |
|
|
|
Resources |
|
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with GPT-J. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource. |
|
|
|
Description of GPT-J. |
|
A blog on how to Deploy GPT-J 6B for inference using Hugging Face Transformers and Amazon SageMaker. |
|
A blog on how to Accelerate GPT-J inference with DeepSpeed-Inference on GPUs. |
|
A blog post introducing GPT-J-6B: 6B JAX-Based Transformer. 🌎 |
|
A notebook for GPT-J-6B Inference Demo. 🌎 |
|
Another notebook demonstrating Inference with GPT-J-6B. |
|
Causal language modeling chapter of the 🤗 Hugging Face Course. |
|
[GPTJForCausalLM] is supported by this causal language modeling example script, text generation example script, and notebook. |
|
[TFGPTJForCausalLM] is supported by this causal language modeling example script and notebook. |
|
[FlaxGPTJForCausalLM] is supported by this causal language modeling example script and notebook. |
|
|
|
Documentation resources |
|
- Text classification task guide |
|
- Question answering task guide |
|
- Causal language modeling task guide |
|
GPTJConfig |
|
[[autodoc]] GPTJConfig |
|
- all |
|
|
|
GPTJModel |
|
[[autodoc]] GPTJModel |
|
- forward |
|
GPTJForCausalLM |
|
[[autodoc]] GPTJForCausalLM |
|
- forward |
|
GPTJForSequenceClassification |
|
[[autodoc]] GPTJForSequenceClassification |
|
- forward |
|
GPTJForQuestionAnswering |
|
[[autodoc]] GPTJForQuestionAnswering |
|
- forward |
|
|
|
TFGPTJModel |
|
[[autodoc]] TFGPTJModel |
|
- call |
|
TFGPTJForCausalLM |
|
[[autodoc]] TFGPTJForCausalLM |
|
- call |
|
TFGPTJForSequenceClassification |
|
[[autodoc]] TFGPTJForSequenceClassification |
|
- call |
|
TFGPTJForQuestionAnswering |
|
[[autodoc]] TFGPTJForQuestionAnswering |
|
- call |
|
|
|
FlaxGPTJModel |
|
[[autodoc]] FlaxGPTJModel |
|
- call |
|
FlaxGPTJForCausalLM |
|
[[autodoc]] FlaxGPTJForCausalLM |
|
- call |
|
|
|
|