File size: 381 Bytes
5fa1a76
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
There is also a fp16 branch which stores the fp16 weights,
  which could be used to further minimize the RAM usage:

thon

from transformers import GPTJForCausalLM
import torch
device = "cuda"
model = GPTJForCausalLM.from_pretrained(
     "EleutherAI/gpt-j-6B",
     revision="float16",
     torch_dtype=torch.float16,
 ).to(device)

The model should fit on 16GB GPU for inference.