Llama-3.1-Nemotron-Nano-4B-v1.1 - GPTQ 4-bit Quantized

This is a 4-bit GPTQ quantized version of nvidia/Llama-3.1-Nemotron-Nano-4B-v1.1 using auto-gptq.

How to use

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "avinashhm/Llama-3.1-Nemotron-Nano-4B-v1.1-GPTQ"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

Downloads last month: 10

Safetensors

Model size

1.29B params

Tensor type

I32

F16

Inference Providers NEW

Text Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support