Inference-time methods that aggregate and prune multiple samples have emerged as a powerful paradigm for steering large language models, yet we lack any principled understanding of their accuracy-cost tradeoffs. In this paper, we introduce a route to rigorously study such approaches using the lens of *particle filtering* algorithms such as Sequential Monte Carlo (SMC). Given a base language model and a *process reward model* estimating expected terminal rewards, we ask: *how accurately can we sample from a target distribution given some number of process reward evaluations?* Theoretically, we identify (1) simple criteria enabling non-asymptotic guarantees for SMC; (2) algorithmic improvements to SMC; and (3) a fundamental limit faced by all particle filtering methods. Empirically, we demonstrate that our theoretical criteria effectively govern the *sampling error* of SMC, though not necessarily its final *accuracy*, suggesting that theoretical perspectives beyond sampling may be necessary.
翻译:通过聚合与筛选多个样本的推理时方法已成为引导大型语言模型的重要范式,然而我们对其准确性与计算成本的权衡关系仍缺乏系统性理解。本文引入基于*粒子滤波*算法(如序列蒙特卡洛方法)的理论框架来严格研究此类方法。给定基础语言模型与评估期望终端奖励的*过程奖励模型*,我们探究:*在限定过程奖励评估次数的条件下,我们能以何种精度从目标分布中采样?* 在理论层面,我们揭示了:(1)实现序列蒙特卡洛非渐进性保障的简明判据;(2)序列蒙特卡洛算法的改进方案;(3)所有粒子滤波方法面临的基础性局限。实证研究表明,我们的理论判据能有效解释序列蒙特卡洛的*采样误差*变化规律,但未必能完全决定其最终*准确度*,这暗示可能需要超越采样理论的研究视角。