To find the best threshold for your model, we recommend experimenting with the llm_int8_threshold parameter in [BitsAndBytesConfig]: from transformers import AutoModelForCausalLM, BitsAndBytesConfig model_id = "bigscience/bloom-1b7" quantization_config = BitsAndBytesConfig( llm_int8_threshold=10, ) model_8bit = AutoModelForCausalLM.from_pretrained( model_id, device_map=device_map, quantization_config=quantization_config, ) Skip module conversion For some models, like Jukebox, you don't need to quantize every module to 8-bit which can actually cause instability.