Llama-3.1-Nemotron-Nano-4B-v1.1 - GPTQ 4-bit Quantized
This is a 4-bit GPTQ quantized version of nvidia/Llama-3.1-Nemotron-Nano-4B-v1.1
using auto-gptq
.
How to use
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "avinashhm/Llama-3.1-Nemotron-Nano-4B-v1.1-GPTQ"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
- Downloads last month
- 10
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support