Large language models (LLMs) achieve remarkable generative performance, yet their output quality is dependent on the decoding strategy. While sampling-based methods (e.g., top-k, nucleus) and search-and-select based methods (e.g., beam search, best-of-n, majority voting) can improve upon greedy decoding, both approaches suffer from limitations: sampling generally commits to a single path, while search often expends excessive computation regardless of task complexity. To address these, we introduce Entropy-informed decoding (EDEN), a plug-and-play, model-agnostic decoding framework that adaptively allocates computation based on the model's own uncertainty, approximating higher-width beam search with fewer expansions. At each generation step, EDEN estimates the entropy of the output token distribution and adjusts the branching factor monotonically with the entropy, expanding more candidates in high-entropy regions and following a greedier path in low-entropy regions, improving token efficiency. Experiments across complex tasks, including mathematical reasoning, code generation, and scientific questions, demonstrate that EDEN consistently improves output quality over existing decoding strategies, achieving better accuracy-expansion trade-offs than fixed-width beam search. By treating next-token selection as a noisy maximisation problem, we prove that branching factors monotone in entropy are guaranteed to find better (i.e. more probable) continuations than any fixed branching factor within the same total expansion budget, and derive explicit regret rates characterising the benefit of the adaptive allocation.
翻译:大型语言模型(LLM)展现出卓越的生成性能,但其输出质量依赖于解码策略。基于采样的方法(如top-k、核采样)与基于搜索选择的方法(如束搜索、最佳N选、多数投票)虽能改进贪婪解码,但两类方法均存在局限:采样通常局限于单一路径,而搜索常因任务复杂度不同而消耗过量计算资源。为此,我们提出基于熵的解码(EDEN)——一种即插即用、模型无关的解码框架,能依据模型自身不确定性自适应分配计算资源,以更少扩展近似更高束宽的搜索。在每个生成步骤中,EDEN估计输出词元分布的熵,并随熵值单调调整分支因子:在熵值较高区域扩展更多候选路径,在熵值较低区域采用更贪婪路径,从而提升词元效率。在数学推理、代码生成和科学问答等复杂任务上的实验表明,EDEN能持续优于现有解码策略,实现比固定束宽搜索更优的准确率-扩展权衡。通过将下一词元选择视为带噪声的最大化问题,我们证明:若分支因子随熵单调变化,则在相同总扩展预算下,其能找到比任何固定分支因子更优(即概率更高)的延续序列,并推导出显式遗憾率来刻画自适应分配的优势。