Spaces:

Ahmadzei
/

RAG

Runtime error

added 3 more tables for large emb model

5fa1a76 over 1 year ago

505 Bytes

	You can check your memory footprint with the get_memory_footprint method:
	py
	print(model.get_memory_footprint())
	Quantized models can be loaded from the [~PreTrainedModel.from_pretrained] method without needing to specify the load_in_8bit or load_in_4bit parameters:

	from transformers import AutoModelForCausalLM, AutoTokenizer
	model = AutoModelForCausalLM.from_pretrained("{your_username}/bloom-560m-8bit", device_map="auto")

	8-bit

	Learn more about the details of 8-bit quantization in this blog post!