File size: 381 Bytes
5fa1a76 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
There is also a fp16 branch which stores the fp16 weights, which could be used to further minimize the RAM usage: thon from transformers import GPTJForCausalLM import torch device = "cuda" model = GPTJForCausalLM.from_pretrained( "EleutherAI/gpt-j-6B", revision="float16", torch_dtype=torch.float16, ).to(device) The model should fit on 16GB GPU for inference. |