Make sure you have autoawq installed: | |
pip install autoawq | |
AWQ-quantized models can be identified by checking the quantization_config attribute in the model's config.json file: | |
json | |
{ | |
"_name_or_path": "/workspace/process/huggingfaceh4_zephyr-7b-alpha/source", | |
"architectures": [ | |
"MistralForCausalLM" | |
], | |
"quantization_config": { | |
"quant_method": "awq", | |
"zero_point": true, | |
"group_size": 128, | |
"bits": 4, | |
"version": "gemm" | |
} | |
} | |
A quantized model is loaded with the [~PreTrainedModel.from_pretrained] method. |