Experience-Guided Adaptation of Inference-Time Reasoning Strategies

Enabling agentic AI systems to adapt their problem-solving approaches based on post-training interactions remains a fundamental challenge. While systems that update and maintain a memory at inference time have been proposed, existing designs only steer the system by modifying textual input to a language model or agent, which means that they cannot change sampling parameters, remove tools, modify system prompts, or switch between agentic and workflow paradigms. On the other hand, systems that adapt more flexibly require offline optimization and remain static once deployed. We present Experience-Guided Reasoner (EGuR), which generates tailored strategies -- complete computational procedures involving LLM calls, tools, sampling parameters, and control logic -- dynamically at inference time based on accumulated experience. We achieve this using an LLM-based meta-strategy -- a strategy that outputs strategies -- enabling adaptation of all strategy components (prompts, sampling parameters, tool configurations, and control logic). EGuR operates through two components: a Guide generates multiple candidate strategies conditioned on the current problem and structured memory of past experiences, while a Consolidator integrates execution feedback to improve future strategy generation. This produces complete, ready-to-run strategies optimized for each problem, which can be cached, retrieved, and executed as needed without wasting resources. Across five challenging benchmarks (AIME 2025, 3-SAT, and three Big Bench Extra Hard tasks), EGuR achieves up to 14% accuracy improvements over the strongest baselines while reducing computational costs by up to 111x, with both metrics improving as the system gains experience.

翻译：使智能体AI系统能够基于训练后交互自适应调整其问题解决方法，仍然是一个根本性挑战。虽然已有研究提出在推理时更新和维护记忆的系统，但现有设计仅通过修改语言模型或智能体的文本输入来引导系统，这意味着它们无法改变采样参数、移除工具、修改系统提示或在智能体与工作流范式间切换。另一方面，能够更灵活自适应的系统需要离线优化，一旦部署即保持静态。本文提出经验引导推理器（EGuR），该系统基于累积经验在推理时动态生成定制化策略——包含LLM调用、工具、采样参数和控制逻辑的完整计算流程。我们通过基于LLM的元策略（即输出策略的策略）实现这一目标，从而支持所有策略组件（提示、采样参数、工具配置和控制逻辑）的自适应。EGuR通过两个组件运行：引导器基于当前问题及结构化历史经验记忆生成多个候选策略，而整合器则通过执行反馈优化未来策略生成。这产生了针对每个问题优化的完整、可立即运行的策略，这些策略可被缓存、检索并按需执行，避免资源浪费。在五个具有挑战性的基准测试（AIME 2025、3-SAT及三项Big Bench Extra Hard任务）中，EGuR相比最强基线实现了高达14%的准确率提升，同时将计算成本降低高达111倍，且两项指标均随系统经验积累持续提升。