Make sure you have autoawq installed: pip install autoawq AWQ-quantized models can be identified by checking the quantization_config attribute in the model's config.json file: json { "_name_or_path": "/workspace/process/huggingfaceh4_zephyr-7b-alpha/source", "architectures": [ "MistralForCausalLM" ], "quantization_config": { "quant_method": "awq", "zero_point": true, "group_size": 128, "bits": 4, "version": "gemm" } } A quantized model is loaded with the [~PreTrainedModel.from_pretrained] method.