反蒸馏采样
Antidistillation Sampling
April 17, 2025
作者: Yash Savani, Asher Trockman, Zhili Feng, Avi Schwarzschild, Alexander Robey, Marc Finzi, J. Zico Kolter
cs.AI
摘要
前沿模型在生成扩展推理轨迹时,无意中产生了丰富的标记序列,这些序列有助于模型蒸馏。意识到这一漏洞,模型所有者可能会寻求在不影响模型性能的前提下限制蒸馏效果的采样策略。反蒸馏采样正是提供了这种能力。通过策略性地修改模型的下一个标记概率分布,反蒸馏采样污染了推理轨迹,使其在蒸馏中的效果大幅降低,同时保持了模型的实际效用。更多详情,请参见https://antidistillation.com。
English
Frontier models that generate extended reasoning traces inadvertently produce
rich token sequences that can facilitate model distillation. Recognizing this
vulnerability, model owners may seek sampling strategies that limit the
effectiveness of distillation without compromising model performance.
Antidistillation sampling provides exactly this capability. By
strategically modifying a model's next-token probability distribution,
antidistillation sampling poisons reasoning traces, rendering them
significantly less effective for distillation while preserving the model's
practical utility. For further details, see https://antidistillation.com.Summary
AI-Generated Summary