can support ignore layers in w8a8_int8 quantization setting?
#12
by
jgfly
- opened
I quantized the DeepSeek R1 model using my own w8a8_int8 quantization algorithm, but encountered issues with the inference results. After investigation, I found the key difference between my implementation and yours lies in handling "ignore layers" - specific layers that should not be quantized. How should I modify my code to properly exclude these layers from quantization?