Spaces:

Ahmadzei
/

RAG

Runtime error

added 3 more tables for large emb model

5fa1a76 over 1 year ago

385 Bytes

	If this is the case, try passing the max_memory parameter to allocate the amount of memory to use on your device (GPU and CPU):
	py
	quantized_model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", max_memory={0: "30GiB", 1: "46GiB", "cpu": "30GiB"}, quantization_config=gptq_config)

	Depending on your hardware, it can take some time to quantize a model from scratch.