Top-nσ:并非所有Logits都是你需要的。
Top-nσ: Not All Logits Are You Need
November 12, 2024
作者: Chenxia Tang, Jianchun Liu, Hongli Xu, Liusheng Huang
cs.AI
摘要
大型语言模型(LLMs)通常在推理任务中采用贪婪解码或低温度抽样,反映了多样性和准确性之间的权衡。我们通过引入top-nsigma挑战这一传统,这是一种直接在预softmax logits上操作的新型抽样方法,利用统计阈值。我们的关键洞察是logits自然地分为高斯分布的嘈杂区域和独特的信息区域,从而实现了有效的令牌过滤,而无需复杂的概率操作。与现有方法(例如top-p,min-p)不同,这些方法在温度较高时会无意中包含更多噪声令牌,top-nsigma无论温度如何缩放,都能保持稳定的抽样空间。我们还对top-nsigma进行了理论分析,以更好地理解其行为。在四个以推理为重点的数据集上进行的广泛实验结果表明,我们的方法不仅优于现有的抽样方法,而且超越了贪婪解码,即使在高温度下也能保持一致的性能。
English
Large language models (LLMs) typically employ greedy decoding or
low-temperature sampling for reasoning tasks, reflecting a perceived trade-off
between diversity and accuracy. We challenge this convention by introducing
top-nsigma, a novel sampling method that operates directly on pre-softmax
logits by leveraging a statistical threshold. Our key insight is that logits
naturally separate into a Gaussian-distributed noisy region and a distinct
informative region, enabling efficient token filtering without complex
probability manipulations. Unlike existing methods (e.g., top-p, min-p)
that inadvertently include more noise tokens at higher temperatures,
top-nsigma maintains a stable sampling space regardless of temperature
scaling. We also provide a theoretical analysis of top-nsigma to better
understand its behavior. The extensive experimental results across four
reasoning-focused datasets demonstrate that our method not only outperforms
existing sampling approaches but also surpasses greedy decoding, while
maintaining consistent performance even at high temperatures.Summary
AI-Generated Summary