Inference-time scaling strategies, particularly Monte Carlo Tree Search (MCTS), have significantly enhanced the reasoning capabilities of Large Language Models (LLMs). However, current approaches remain predominantly stateless, discarding successful reasoning patterns after each problem instance and failing to mimic the empirical accumulation of wisdom characteristic of human problem-solving. To bridge this gap, we introduce Empirical-MCTS, a dual-loop framework that transforms stateless search into a continuous, non-parametric learning process. The framework unifies local exploration with global memory optimization through two novel mechanisms: Pairwise-Experience-Evolutionary Meta-Prompting (PE-EMP) and a Memory Optimization Agent. PE-EMP functions as a reflexive optimizer within the local search, utilizing pairwise feedback to dynamically synthesize adaptive criteria and evolve meta-prompts (system prompts) in real-time. Simultaneously, the Memory Optimization Agent manages a global repository as a dynamic policy prior, employing atomic operations to distill high-quality insights across problems. Extensive evaluations on complex reasoning benchmarks, including AIME25, ARC-AGI-2, and MathArena Apex, demonstrate that Empirical-MCTS significantly outperforms both stateless MCTS strategies and standalone experience-driven agents. These results underscore the critical necessity of coupling structured search with empirical accumulation for mastering complex, open-ended reasoning tasks.
翻译:推理时扩展策略,特别是蒙特卡洛树搜索,已显著增强了大语言模型的推理能力。然而,当前方法主要仍是无状态的,在解决每个问题实例后便丢弃成功的推理模式,未能模拟人类问题解决过程中特有的经验性智慧积累。为弥合这一差距,我们提出了经验MCTS,一个双循环框架,它将无状态搜索转化为一个连续、非参数化的学习过程。该框架通过两种新颖机制——成对经验演化元提示与记忆优化智能体——将局部探索与全局记忆优化相统一。成对经验演化元提示在局部搜索中充当反射式优化器,利用成对反馈动态合成自适应准则,并实时演化元提示。同时,记忆优化智能体将一个全局知识库作为动态策略先验进行管理,运用原子操作跨问题提炼高质量洞见。在包括AIME25、ARC-AGI-2和MathArena Apex在内的复杂推理基准上的广泛评估表明,经验MCTS显著优于无状态MCTS策略及独立的经验驱动智能体。这些结果凸显了将结构化搜索与经验积累相结合对于掌握复杂、开放式推理任务的关键必要性。