We study best-of-$N$ for large language models (LLMs) where the selection is based on majority voting. In particular, we analyze the limit $N \to \infty$, which we denote as \boinflower. While this approach achieves impressive performance in the limit, it requires an infinite test-time budget. To address this, we propose an adaptive generation scheme that selects $N$ based on answer agreement, thereby efficiently allocating inference-time computation. Beyond adaptivity, we extend the framework to weighted ensembles of multiple LLMs, showing that such mixtures can outperform any individual model. The optimal ensemble weighting is formulated and efficiently computed as a mixed-integer linear program. Extensive experiments demonstrate the effectiveness of our approach.
翻译:本文研究基于多数投票机制的大语言模型(LLM)最佳N采样方法。我们重点分析当N趋于无穷大时的极限情况,并将其记为\boinflower。尽管该方法在极限条件下能取得优异性能,但其需要无限的测试时计算资源。为解决此问题,我们提出一种自适应生成方案,该方案根据答案一致性动态选择N值,从而高效分配推理时计算资源。除自适应性外,我们将该框架扩展至多LLM的加权集成,证明此类混合模型能够超越任何单一模型。我们通过混合整数线性规划形式化地构建了最优集成权重方案,并实现了高效计算。大量实验验证了我们方法的有效性。