Type of quantization
#1
by
systemicAnomaly
- opened
Hi,
Is this an AWQ or GPTQ W4A16 quantization ?
do you need to be able to load the entire model into vram to quant this with llmcompressor? was looking for an awq quant.
I wanted to create a awq with leaving the gates in full precision but it looks like u do need to be able to load the full model at least in system ram. if not vram. So I could not do it. This quant I could do sequentially