Large language models (LLMs) typically employ greedy decoding or low-temperature sampling for reasoning tasks, reflecting a perceived trade-off between diversity and accuracy. We challenge this convention by introducing top-$n\sigma$, a novel sampling method that operates directly on pre-softmax logits by leveraging a statistical threshold. Our key insight is that logits naturally separate into a Gaussian-distributed noisy region and a distinct informative region, enabling efficient token filtering without complex probability manipulations. Unlike existing methods (e.g., top-$p$, min-$p$) that inadvertently include more noise tokens at higher temperatures, top-$n\sigma$ maintains a stable sampling space regardless of temperature scaling. We also provide a theoretical analysis of top-$n\sigma$ to better understand its behavior. The extensive experimental results across four reasoning-focused datasets demonstrate that our method not only outperforms existing sampling approaches but also surpasses greedy decoding, while maintaining consistent performance even at high temperatures.
翻译:大型语言模型(LLMs)通常在推理任务中采用贪婪解码或低温采样,这反映了多样性与准确性之间的一种被普遍认知的权衡。我们通过引入top-$n\sigma$这一新颖的采样方法来挑战这一惯例,该方法利用统计阈值直接在softmax前的logits上进行操作。我们的核心洞见是,logits自然地分离为高斯分布的噪声区域和独特的信息区域,从而能够在不进行复杂概率操作的情况下实现高效的token过滤。与现有方法(例如top-$p$、min-$p$)在较高温度下无意中引入更多噪声token不同,top-$n\sigma$无论温度如何缩放都能保持稳定的采样空间。我们还对top-$n\sigma$进行了理论分析,以更好地理解其行为。在四个专注于推理的数据集上进行的大量实验结果表明,我们的方法不仅优于现有的采样方法,而且超越了贪婪解码,同时即使在高温度下也能保持一致的性能。