A recurring pattern in "reasoning without training" is that base LLMs already assign non-trivial probability mass to correct multi-step solutions; the bottleneck is locating these modes efficiently at inference time. Power sampling provides a principled way to bias decoding toward such modes by targeting p_theta(x)^alpha with alpha > 1, but practical approximations must account for future-dependent correction factors that determine which prefixes remain promising. We introduce Auxiliary Particle Power Sampling (APPS), a blockwise particle algorithm for approximating the sequence-level power target with a bounded population of partial solutions. APPS propagates hypotheses in parallel using proposal-corrected power reweighting and refines their survival through future-value-guided selection at resampling boundaries. This redistributes finite compute across competing prefixes rather than committing to a single unfolding path, while providing a direct scaling knob in the particle count and predictable peak memory. We instantiate the future-value signal with short-horizon rollouts and also study an amortized variant that replaces rollouts with a lightweight learned selection head. AMore broadly, APPS improves the accuracy--runtime trade-off of training-free decoding, further supporting the view that inference-time power approximation can recover gains often attributed to post-training.
翻译:“无训练推理”中一个反复出现的模式是:基础大语言模型已为非平凡的多步正确解分配了可观的概率质量;瓶颈在于如何在推理阶段高效定位这些模态。功率采样通过针对 α>1 的 p_θ(x)^α 分布提供了一种有原则的偏差解码方法,但实际近似必须考虑决定哪些前缀仍具潜力的未来依赖校正因子。我们引入辅助粒子功率采样(APPS),这是一种基于块状粒子的算法,用于通过有界部分解种群逼近序列级功率目标。APPS 利用提议校正的功率重加权并行传播假设,并在重采样边界通过未来价值引导的选择优化其生存率。这将在竞争前缀间重新分配有限计算资源,而非承诺单一展开路径,同时提供粒子数量的直接缩放旋钮和可预测的峰值内存。我们通过短视滚动展开实例化未来价值信号,并研究了一种用轻量级学习选择头替代滚动展开的摊销变体。更广泛地,APPS 改善了无训练解码的精度-运行时间权衡,进一步支持了推理时功率近似可恢复通常归因于后训练的收益的观点。