LLM inference often generates a batch of candidates for a prompt and selects one via strategies like majority voting or Best-of- N (BoN). For difficult tasks, this single-shot selection often underperforms. Consequently, evaluations commonly report Pass@$k$: the agent may submit up to $k$ responses, and only the best of them is used when computing regret. Motivated by this, we study inference scaling in the more general Pass@$k$ inference setting, and prove that neither majority voting nor BoN exhibits the desirable scaling with $k$ and the sampling budget $N$. Combining the advantages of majority voting and BoN, we propose a new inference strategy called Best-of-Majority (BoM), with a pivotal step that restricts the candidates to the responses with high frequency in the $N$ samples before selecting the top-$k$ rewards. We prove that when the sampling budget is $N=\tilde\Omega(C^*)$, the regret of BoM is $O(\epsilon_{\mathrm{opt}}+\sqrt{\epsilon_{\mathrm{RM}}^2C^*/k})$, where $C^*$ is the coverage coefficient, $\epsilon_{\mathrm{RM}}$ is the estimation error of the reward model, and $\epsilon_{\mathrm{opt}}$ is the estimation error of reward at the optimal response. We further establish a matching lower bound, certifying that our algorithm is minimax optimal. Beyond optimality, BoM has a key advantage: unlike majority voting and BoN, its performance does not degrade when increasing $N$. Experimental results of inference on math problems show BoM outperforming both majority voting and BoN.
翻译:大语言模型推理通常为给定提示生成一批候选结果,并通过多数投票或最优N选择等策略选取其一。对于困难任务,这种单次选择往往表现不佳。因此,评估中常采用Pass@$k$指标:智能体最多可提交$k$个响应,在计算遗憾时仅使用其中最优者。受此启发,我们在更一般的Pass@$k$推理框架下研究推理扩展问题,并证明多数投票和最优N选择均未展现出随$k$与采样预算$N$的理想扩展特性。结合多数投票与最优N选择的优势,我们提出名为最优多数投票的新推理策略,其关键步骤是在选取前$k$个最高奖励响应前,先将候选集限制在$N$次采样中出现高频的响应。我们证明当采样预算为$N=\tilde\Omega(C^*)$时,最优多数投票的遗憾度为$O(\epsilon_{\mathrm{opt}}+\sqrt{\epsilon_{\mathrm{RM}}^2C^*/k})$,其中$C^*$为覆盖系数,$\epsilon_{\mathrm{RM}}$为奖励模型的估计误差,$\epsilon_{\mathrm{opt}}$为最优响应处奖励的估计误差。我们进一步建立了匹配的下界,证明该算法具有极小极大最优性。除最优性外,最优多数投票具有关键优势:与多数投票和最优N选择不同,其性能不会随$N$增大而下降。数学问题推理实验表明,最优多数投票的性能优于多数投票和最优N选择。