Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
raw
history blame contribute delete
602 Bytes
With Jukebox, there are several lm_head modules that should be skipped using the llm_int8_skip_modules parameter in [BitsAndBytesConfig]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
model_id = "bigscience/bloom-1b7"
quantization_config = BitsAndBytesConfig(
llm_int8_skip_modules=["lm_head"],
)
model_8bit = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
quantization_config=quantization_config,
)
Finetuning
With the PEFT library, you can finetune large models like flan-t5-large and facebook/opt-6.7b with 8-bit quantization.