Power-SMC: Low-Latency Sequence-Level Power Sampling for Training-Free LLM Reasoning

Many recent reasoning gains in large language models can be explained as distribution sharpening: biasing generation toward high-likelihood trajectories already supported by the pretrained model, rather than modifying its weights. A natural formalization is the sequence-level power distribution $π_α(y\mid x)\propto p_θ(y\mid x)^α$ ($α>1$), which concentrates mass on whole sequences instead of adjusting token-level temperature. Prior work shows that Metropolis--Hastings (MH) sampling from this distribution recovers strong reasoning performance, but at order-of-magnitude inference slowdowns. We introduce Power-SMC, a training-free Sequential Monte Carlo scheme that targets the same objective while remaining close to standard decoding latency. Power-SMC advances a small particle set in parallel, corrects importance weights token-by-token, and resamples when necessary, all within a single GPU-friendly batched decode. We prove that temperature $τ=1/α$ is the unique prefix-only proposal minimizing incremental weight variance, interpret residual instability via prefix-conditioned Rényi entropies, and introduce an exponent-bridging schedule that improves particle stability without altering the target. On MATH500, Power-SMC matches or exceeds MH power sampling while reducing latency from $16$--$28\times$ to $1.4$--$3.3\times$ over baseline decoding. The code is available at https://github.com/ArminAzizi98/Power-SMC.

翻译：近年来大语言模型在推理能力上的诸多提升可归因于分布锐化机制：即在不修改模型权重的前提下，将生成过程偏向于预训练模型已支持的高似然轨迹。其自然形式化表述为序列级幂分布 $π_α(y\mid x)\propto p_θ(y\mid x)^α$（$α>1$），该分布将概率质量集中作用于完整序列而非调整词元级温度参数。已有研究表明，从该分布进行Metropolis-Hastings（MH）采样可恢复强劲的推理性能，但会导致数量级级别的推理延迟。我们提出Power-SMC方法，这是一种免训练的序贯蒙特卡洛方案，在保持接近标准解码延迟的同时实现相同目标。Power-SMC以并行方式推进小型粒子集，逐词元校正重要性权重，并在必要时进行重采样，所有操作均可在单个GPU友好的批处理解码过程中完成。我们证明温度参数 $τ=1/α$ 是唯一能最小化增量权重方差的前缀唯一提议分布，通过前缀条件化Rényi熵解释残差不稳定性，并引入指数桥接调度策略以在不改变目标分布的前提下提升粒子稳定性。在MATH500数据集上，Power-SMC匹配或超越MH幂采样性能，同时将延迟从基线解码的$16$--$28$倍降至$1.4$--$3.3$倍。代码已开源：https://github.com/ArminAzizi98/Power-SMC。

相关内容

SMC

关注 0

SMC:IEEE International Conference on Systems,Man, and Cybernetics Explanation：IEEE系统、人与控制论国际会议。 Publisher：IEEE。 SIT： https://dblp.uni-trier.de/db/conf/smc/

【ICML2026】面向长上下文大语言模型的训练-推理一致性分段执行

专知会员服务

9+阅读 · 5月14日

142页DeepSeek-R1 思维链技术：让我们一起<思考>大语言模型（LLM）的推理能力

专知会员服务

48+阅读 · 2025年4月12日

如何提升大模型通用推理能力？DeepSeek最新论文《CODEI/O：通过代码输入输出预测凝练推理模式》

专知会员服务

42+阅读 · 2025年2月16日

大模型数学推理数据合成相关方法

专知会员服务

36+阅读 · 2025年1月19日