With Jukebox, there are several lm_head modules that should be skipped using the llm_int8_skip_modules parameter in [BitsAndBytesConfig]: from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig model_id = "bigscience/bloom-1b7" quantization_config = BitsAndBytesConfig( llm_int8_skip_modules=["lm_head"], ) model_8bit = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", quantization_config=quantization_config, ) Finetuning With the PEFT library, you can finetune large models like flan-t5-large and facebook/opt-6.7b with 8-bit quantization.