Can you roll out a 3.0bpw quantization model?
#1
by
xldistance
- opened
My video card only has 48GB of video memory
Not sure if you can even fit a 3.0bpw quant of this in 48GB of vram but gghfez/c4ai-command-a-03-2025-exl2-3bpw
You probably have to use llama.cpp and offload some layers to the CPU