Memory agents, which depart from predefined memory-processing pipelines by endogenously managing the processing, storage, and retrieval of memories, have garnered increasing attention for their autonomy and adaptability. However, existing training paradigms remain constrained: agents often traverse long-horizon sequences of memory operations before receiving sparse and delayed rewards, which hinders truly end-to-end optimization of memory management policies. To address this limitation, we introduce Mem-T, an autonomous memory agent that interfaces with a lightweight hierarchical memory database to perform dynamic updates and multi-turn retrieval over streaming inputs. To effectively train long-horizon memory management capabilities, we further propose MoT-GRPO, a tree-guided reinforcement learning framework that transforms sparse terminal feedback into dense, step-wise supervision via memory operation tree backpropagation and hindsight credit assignment, thereby enabling the joint optimization of memory construction and retrieval. Extensive experiments demonstrate that Mem-T is (1) high-performing, surpassing frameworks such as A-Mem and Mem0 by up to $14.92\%$, and (2) economical, operating on a favorable accuracy-efficiency Pareto frontier and reducing inference tokens per query by $\sim24.45\%$ relative to GAM without sacrificing performance.
翻译:记忆智能体通过内生地管理记忆的处理、存储与检索,摆脱了预定义记忆处理流程的束缚,其自主性与适应性日益受到关注。然而,现有训练范式仍存在局限:智能体通常需执行长时程的记忆操作序列才能获得稀疏且延迟的奖励,这阻碍了记忆管理策略的真正端到端优化。为突破此限制,我们提出Mem-T——一种与轻量级分层记忆数据库交互的自主记忆智能体,能够对流式输入执行动态更新与多轮检索。为有效训练长时程记忆管理能力,我们进一步提出MoT-GRPO框架:该树形引导的强化学习框架通过记忆操作树反向传播与事后信用分配,将稀疏的终端反馈转化为密集的步进式监督,从而实现记忆构建与检索的联合优化。大量实验表明,Mem-T具有(1)高性能特性:在多项任务中超越A-Mem、Mem0等框架达$14.92\%$;(2)经济性优势:在精度-效率帕累托前沿上表现优异,相较于GAM在保持性能的同时将单次查询推理令牌数降低约$\sim24.45\%$。