Spaces:

Ahmadzei
/

RAG

Runtime error

added 3 more tables for large emb model

5fa1a76 over 1 year ago

306 Bytes

	from transformers import AutoModelForCausalLM
	model = AutoModelForCausalLM.from_pretrained("{your_username}/opt-125m-gptq", device_map="auto")

	ExLlama
	ExLlama is a Python/C++/CUDA implementation of the Llama model that is designed for faster inference with 4-bit GPTQ weights (check out these benchmarks).