Ask Only When Needed: Proactive Retrieval from Memory and Skills for Experience-Driven Lifelong Agents

Online lifelong learning enables agents to accumulate experience across interactions and continually improve on long-horizon tasks. However, existing methods typically treat retrieval from past experience as a passive operation, triggering it only at task initialization or after completing a step. Consequently, agents often fail to identify knowledge gaps during interaction and proactively retrieve the most useful experience for the current decision. To address this limitation, we present ProactAgent, an experience-driven lifelong learning framework for proactive retrieval over a structured experience base. We first introduce Experience-Enhanced Online Evolution (ExpOnEvo), which enables continual improvement through both policy updates and memory refinement. The experience base organizes historical interactions into typed repositories, including factual memory, episodic memory, and behavioral skills, so that retrieval can provide both relevant evidence and actionable guidance. On top of this, we propose Proactive Reinforcement Learning-based Retrieval (ProactRL), which models retrieval as an explicit policy action and learns when and what to retrieve via paired-branch process rewards. By comparing continuations from identical interaction prefixes with and without retrieval, ProactRL provides step-level supervision for retrieval decisions, encouraging retrieval only when it leads to better task outcomes or higher efficiency. Experiments on SciWorld, AlfWorld, and StuLife show that ProactAgent consistently improves lifelong agent performance, achieving success rates of 73.50\% on SciWorld and 71.28\% on AlfWorld while substantially reducing retrieval overhead, and attains performance competitive with proprietary models on StuLife.

翻译：在线终身学习使智能体能够在多次交互中积累经验，并持续改进长周期任务的表现。然而，现有方法通常将过去经验的检索视为被动操作，仅在任务初始化或完成一个步骤后触发。因此，智能体往往无法在交互过程中识别知识缺口，也无法主动检索对当前决策最有益的经验。为解决这一局限，我们提出ProactAgent——一种面向结构化经验库的主动检索经验驱动终身学习框架。首先引入经验增强在线进化（ExpOnEvo），通过策略更新与记忆精炼实现持续改进。经验库将历史交互组织为类型化存储库，包括事实记忆、情景记忆和行为技能，使检索能同时提供相关证据与可操作指导。在此基础上，提出基于主动强化学习的检索方法（ProactRL），将检索建模为显式策略动作，并通过成对分支过程奖励学习何时检索及检索什么。通过比较相同交互前缀在有无检索下的后续表现，ProactRL为检索决策提供步骤级监督，仅在检索能带来更优任务结果或更高效率时才鼓励执行检索。在SciWorld、AlfWorld和StuLife上的实验表明，ProactAgent持续提升终身智能体性能，在SciWorld和AlfWorld上分别达到73.50%和71.28%的成功率，同时大幅降低检索开销，并在StuLife上取得与专有模型竞争的性能。