Test-Time Scaling enhances the reasoning capabilities of Large Language Models by allocating additional inference compute to broaden the exploration of the solution space. However, existing search strategies typically treat rollouts as disposable samples, where valuable intermediate insights are effectively discarded after each trial. This systemic memorylessness leads to massive computational redundancy, as models repeatedly re-derive discovered conclusions and revisit known dead ends across extensive attempts. To bridge this gap, we propose \textbf{Recycling Search Experience (RSE)}, a self-guided, training-free strategy that turns test-time search from a series of isolated trials into a cumulative process. By actively distilling raw trajectories into a shared experience bank, RSE enables positive recycling of intermediate conclusions to shortcut redundant derivations and negative recycling of failure patterns to prune encountered dead ends. Theoretically, we provide an analysis that formalizes the efficiency gains of RSE, validating its advantage over independent sampling in solving complex reasoning tasks. Empirically, extensive experiments on HMMT24, HMMT25, IMO-Bench, and HLE show that RSE consistently outperforms strong baselines with comparable computational cost, achieving state-of-the-art scaling efficiency.
翻译:测试时扩展通过分配额外的推理计算资源来扩大解空间的探索范围,从而增强大型语言模型的推理能力。然而,现有的搜索策略通常将探索轨迹视为一次性样本,其中宝贵的中间洞察在每次尝试后实际上被丢弃。这种系统性的无记忆性导致了巨大的计算冗余,因为模型在大量尝试中反复重新推导已发现的结论并重访已知的死胡同。为弥补这一差距,我们提出了\textbf{回收搜索经验(RSE)},这是一种自引导、无需训练的策略,它将测试时搜索从一系列孤立的尝试转变为一个累积过程。通过将原始轨迹主动提炼到共享经验库中,RSE实现了中间结论的积极回收以跳过冗余推导,以及失败模式的消极回收以修剪遇到的死胡同。理论上,我们提供了一个分析,形式化地阐述了RSE的效率增益,验证了其在解决复杂推理任务上相较于独立采样的优势。实证方面,在HMMT24、HMMT25、IMO-Bench和HLE上进行的大量实验表明,在计算成本相当的情况下,RSE始终优于强基线方法,实现了最先进的扩展效率。