Can you roll out a 3.0bpw quantization model?

#1
by xldistance - opened

My video card only has 48GB of video memory

Not sure if you can even fit a 3.0bpw quant of this in 48GB of vram but gghfez/c4ai-command-a-03-2025-exl2-3bpw
You probably have to use llama.cpp and offload some layers to the CPU

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment