can support ignore layers in w8a8_int8 quantization setting?

#12
by jgfly - opened

I quantized the DeepSeek R1 model using my own w8a8_int8 quantization algorithm, but encountered issues with the inference results. After investigation, I found the key difference between my implementation and yours lies in handling "ignore layers" - specific layers that should not be quantized. How should I modify my code to properly exclude these layers from quantization?

Sign up or log in to comment