File size: 354 Bytes
5fa1a76 |
1 2 3 4 5 6 7 |
Use the device_map parameter to specify where to place the model: from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "TheBloke/zephyr-7B-alpha-AWQ" model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cuda:0") Loading an AWQ-quantized model automatically sets other weights to fp16 by default for performance reasons. |