A simple strategy for improving LLM accuracy, especially in math and reasoning problems, is to sample multiple responses and submit the answer most consistently reached. In this paper we leverage Bayesian prior information to save on sampling costs, stopping once sufficient consistency is reached. Although the exact posterior is computationally intractable, we further introduce an efficient "L-aggregated" stopping policy that tracks only the L-1 most frequent answer counts. Theoretically, we prove that L=3 is all you need: this coarse approximation is sufficient to achieve asymptotic optimality, and strictly dominates prior-free baselines, while having a fast posterior computation. Empirically, this identifies the most consistent (i.e., mode) LLM answer using fewer samples, and can achieve similar answer accuracy while cutting the number of LLM calls (i.e., saving on LLM inference costs) by up to 50%.
翻译:提升大语言模型准确率(尤其在数学与推理问题上)的一种简单策略是采样多个响应并提交最一致达成的答案。本文利用贝叶斯先验信息以节省采样成本,一旦达到足够的一致性即停止采样。尽管精确后验在计算上难以处理,我们进一步引入一种高效的“L-聚合”停止策略,该策略仅追踪出现频率最高的 L-1 个答案计数。理论上,我们证明 L=3 即足够:这种粗粒度近似足以实现渐近最优性,且严格优于无先验基线,同时具有快速的后验计算能力。实证表明,该方法能以更少样本识别出最一致(即众数)的大语言模型答案,在将大语言模型调用次数(即节省大语言模型推理成本)降低高达 50% 的同时,仍能达到相近的答案准确率。