Llama-3.1-Nemotron-Nano-4B-v1.1 - GPTQ 4-bit Quantized

This is a 4-bit GPTQ quantized version of nvidia/Llama-3.1-Nemotron-Nano-4B-v1.1 using auto-gptq.

How to use

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "avinashhm/Llama-3.1-Nemotron-Nano-4B-v1.1-GPTQ"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
Downloads last month
10
Safetensors
Model size
1.29B params
Tensor type
I32
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support