Decoding sits between a language model and everything we do with it, yet it is still treated as a heuristic knob-tuning exercise. We argue decoding should be understood as a principled optimisation layer: at each token, we solve a regularised problem over the probability simplex that trades off model score against structural preferences and constraints. This single template recovers greedy decoding, Softmax sampling, Top-K, Top-P, and Sparsemax-style sparsity as special cases, and explains their common structure through optimality conditions. More importantly, the framework makes it easy to invent new decoders without folklore. We demonstrate this by designing Best-of-K (BoK), a KL-anchored coverage objective aimed at multi-sample pipelines (self-consistency, reranking, verifier selection). BoK targets the probability of covering good alternatives within a fixed K-sample budget and improves empirical performance. We show that such samples can improve accuracy by, for example, +18.6% for Qwen2.5-Math-7B on MATH500 at high sampling temperatures.
翻译:解码位于语言模型与其所有应用之间,却仍被视为启发式的参数调节过程。我们认为解码应被理解为一个原则性的优化层:在每个词元位置,我们在概率单纯形上求解一个正则化问题,以权衡模型得分与结构化偏好及约束。这一统一框架将贪婪解码、Softmax采样、Top-K、Top-P及Sparsemax式稀疏性均恢复为特例,并通过最优性条件揭示其共同结构。更重要的是,该框架使得无需依赖经验法则即可设计新解码器成为可能。我们通过设计Best-of-K(BoK)解码器对此进行验证——这是一种面向多样本流程(自洽性、重排序、验证器选择)的KL锚定覆盖目标。BoK旨在以固定的K样本预算覆盖优质备选方案的概率,并提升了实证性能。实验表明,此类样本能够显著提高准确率,例如在高采样温度下,Qwen2.5-Math-7B在MATH500数据集上的准确率可提升+18.6%。