thon import torch from transformers import AwqConfig, AutoModelForCausalLM model_id = "TheBloke/Mistral-7B-OpenOrca-AWQ" quantization_config = AwqConfig( bits=4, fuse_max_seq_len=512, do_fuse=True, ) model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config).to(0) For architectures that don't support fused modules yet, you need to create a custom fusing mapping to define which modules need to be fused with the modules_to_fuse parameter.