Spaces:

Ahmadzei
/

RAG

Runtime error

added 3 more tables for large emb model

5fa1a76 over 1 year ago

909 Bytes

	For example, to enable offloading for the bigscience/bloom-1b7 model, start by creating a [BitsAndBytesConfig]:

	from transformers import AutoModelForCausalLM, BitsAndBytesConfig
	quantization_config = BitsAndBytesConfig(llm_int8_enable_fp32_cpu_offload=True)

	Design a custom device map to fit everything on your GPU except for the lm_head, which you'll dispatch to the CPU:
	py
	device_map = {
	"transformer.word_embeddings": 0,
	"transformer.word_embeddings_layernorm": 0,
	"lm_head": "cpu",
	"transformer.h": 0,
	"transformer.ln_f": 0,
	}
	Now load your model with the custom device_map and quantization_config:
	py
	model_8bit = AutoModelForCausalLM.from_pretrained(
	"bigscience/bloom-1b7",
	device_map=device_map,
	quantization_config=quantization_config,
	)
	Outlier threshold
	An "outlier" is a hidden state value greater than a certain threshold, and these values are computed in fp16.