can support ignore layers in w8a8_int8 quantization setting?

#12

by jgfly - opened May 1

May 1

I quantized the DeepSeek R1 model using my own w8a8_int8 quantization algorithm, but encountered issues with the inference results. After investigation, I found the key difference between my implementation and yours lies in handling "ignore layers" - specific layers that should not be quantized. How should I modify my code to properly exclude these layers from quantization?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment