Type of quantization

#1
by systemicAnomaly - opened

Hi,
Is this an AWQ or GPTQ W4A16 quantization ?

do you need to be able to load the entire model into vram to quant this with llmcompressor? was looking for an awq quant.

I wanted to create a awq with leaving the gates in full precision but it looks like u do need to be able to load the full model at least in system ram. if not vram. So I could not do it. This quant I could do sequentially

Sign up or log in to comment