ChatPaper.aiChatPaper

反蒸餾採樣

Antidistillation Sampling

April 17, 2025
作者: Yash Savani, Asher Trockman, Zhili Feng, Avi Schwarzschild, Alexander Robey, Marc Finzi, J. Zico Kolter
cs.AI

摘要

生成延伸推理軌跡的前沿模型無意中產生了豐富的詞元序列,這些序列可能促進模型蒸餾。認識到這一脆弱性,模型所有者可能會尋求採樣策略,在不影響模型性能的前提下限制蒸餾的有效性。反蒸餾採樣正是提供了這種能力。通過策略性地修改模型的下一個詞元概率分佈,反蒸餾採樣毒害了推理軌跡,使其在保持模型實際效用的同時,對蒸餾的效果大幅降低。更多詳情,請參見https://antidistillation.com。
English
Frontier models that generate extended reasoning traces inadvertently produce rich token sequences that can facilitate model distillation. Recognizing this vulnerability, model owners may seek sampling strategies that limit the effectiveness of distillation without compromising model performance. Antidistillation sampling provides exactly this capability. By strategically modifying a model's next-token probability distribution, antidistillation sampling poisons reasoning traces, rendering them significantly less effective for distillation while preserving the model's practical utility. For further details, see https://antidistillation.com.

Summary

AI-Generated Summary

PDF544April 18, 2025