We present Mem-$π$, a framework for adaptive memory in large language model (LLM) agents, where useful guidance is generated on demand rather than retrieved from external memory stores. Existing memory-augmented agents typically rely on similarity-based retrieval from episodic memory banks or skill libraries, returning static entries that often misalign with the current context. In contrast, Mem-$π$ uses a dedicated language or vision-language model with its own parameters, separate from the downstream agent, to generate context-specific guidance for complex tasks. Conditioned on the current agent context, the model jointly decides when to produce guidance and what guidance to produce. We train it with a decision-content decoupled reinforcement learning (RL) objective, enabling it to abstain when generation would not help and otherwise produce concise, useful guidance. Across diverse agentic benchmarks spanning web navigation, terminal-based tool use, and text-based embodied interaction, Mem-$π$ consistently outperforms retrieval-based and prior RL-optimized memory baselines, achieving over 30% relative improvement on web navigation tasks.
翻译:我们提出了Mem-$π$,一个用于大型语言模型(LLM)智能体的自适应记忆框架,该框架按需生成有用的指导信息,而非从外部存储库中检索。现有记忆增强型智能体通常依赖基于相似性的检索从情景记忆库或技能库中获取静态条目,这些条目往往与当前上下文不一致。相比之下,Mem-$π$采用专用的语言或视觉语言模型(具有独立于下游智能体的自身参数)为复杂任务生成上下文特定的指导。基于当前智能体上下文,该模型联合决定何时生成指导以及生成何种指导。我们采用决策-内容解耦的强化学习(RL)目标进行训练,使其能够在生成无益时主动放弃操作,否则生成简洁有效的指导。在涵盖网页导航、终端工具使用和基于文本的具身交互的多样化智能体基准测试中,Mem-$π$持续优于基于检索的基线方法和先前采用强化学习优化的记忆基线,在网页导航任务上实现了超过30%的相对性能提升。