Abstract
Frontier models that generate extended reasoning traces inadvertently produce rich token sequences that can facilitate model distillation. Recognizing this vulnerability, model owners may seek sampling strategies that limit the effectiveness of distillation without compromising model performance. Antidistillation sampling provides exactly this capability. By strategically modifying a model's next-token probability distribution, antidistillation sampling poisons reasoning traces, rendering them significantly less effective for distillation while preserving the model's practical utility. For further details, see https://antidistillation.com.
Community
Really nice paper!!
After reading, I have one question here.
What is the underlying intuition for the definition of Eq.6? In my view, the anti-distillation aims to maximize the delta term in Eq.5 or its final simple form in Eq.11. But I do not understand the relation between the delta term and Eq.6. I would greatly appreciate it if you could provide a more in-depth explanation. Thank you! :D
Thanks! In Eq. 6, we simply take the teacher’s softmax logits and nudge them in the direction that most poisons the student. The vanilla term (1/τ) · log p_T keeps teacher‐preferred tokens likely, and the added λ·Δ term up‐weights exactly those tokens whose fine‐tuning update would increase the student’s loss.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners (2025)
- M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models (2025)
- UNDO: Understanding Distillation as Optimization (2025)
- TwT: Thinking without Tokens by Habitual Reasoning Distillation with Multi-Teachers' Guidance (2025)
- OpenCodeReasoning: Advancing Data Distillation for Competitive Coding (2025)
- Cross-Tokenizer Distillation via Approximate Likelihood Matching (2025)
- Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones? (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper