arxiv:2504.13146

Antidistillation Sampling

Published on Apr 17

· Submitted by

schwarzschild on Apr 18

#2 Paper of the day

Upvote

Authors:

Yash Savani ,

Alexander Robey ,

Abstract

Frontier models that generate extended reasoning traces inadvertently produce rich token sequences that can facilitate model distillation. Recognizing this vulnerability, model owners may seek sampling strategies that limit the effectiveness of distillation without compromising model performance. Antidistillation sampling provides exactly this capability. By strategically modifying a model's next-token probability distribution, antidistillation sampling poisons reasoning traces, rendering them significantly less effective for distillation while preserving the model's practical utility. For further details, see https://antidistillation.com.

View arXiv page View PDF Project page Add to collection

Community

schwarzschild

Paper submitter 4 days ago

•

edited 4 days ago

antidistillation.com

wangsssssss

4 days ago

Really nice paper!!
After reading, I have one question here.
What is the underlying intuition for the definition of Eq.6? In my view, the anti-distillation aims to maximize the delta term in Eq.5 or its final simple form in Eq.11. But I do not understand the relation between the delta term and Eq.6. I would greatly appreciate it if you could provide a more in-depth explanation. Thank you! :D

yashsavani

Paper author 4 days ago

Thanks! In Eq. 6, we simply take the teacher’s softmax logits and nudge them in the direction that most poisons the student. The vanilla term (1/τ) · log p_T keeps teacher‐preferred tokens likely, and the added λ·Δ term up‐weights exactly those tokens whose fine‐tuning update would increase the student’s loss.