File size: 354 Bytes
5fa1a76
 
 
 
 
 
 
1
2
3
4
5
6
7
Use the device_map parameter to specify where to place the model:

from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "TheBloke/zephyr-7B-alpha-AWQ"
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cuda:0")

Loading an AWQ-quantized model automatically sets other weights to fp16 by default for performance reasons.