MemBuilder: Reinforcing LLMs for Long-Term Memory Construction via Attributed Dense Rewards

Maintaining consistency in long-term dialogues remains a fundamental challenge for LLMs, as standard retrieval mechanisms often fail to capture the temporal evolution of historical states. While memory-augmented frameworks offer a structured alternative, current systems rely on static prompting of closed-source models or suffer from ineffective training paradigms with sparse rewards. We introduce MemBuilder, a reinforcement learning framework that trains models to orchestrate multi-dimensional memory construction with attributed dense rewards. MemBuilder addresses two key challenges: (1) Sparse Trajectory-Level Rewards: we employ synthetic session-level question generation to provide dense intermediate rewards across extended trajectories; and (2) Multi-Dimensional Memory Attribution: we introduce contribution-aware gradient weighting that scales policy updates based on each component's downstream impact. Experimental results show that MemBuilder enables a 4B-parameter model to outperform state-of-the-art closed-source baselines, exhibiting strong generalization across long-term dialogue benchmarks.

翻译：在长期对话中保持一致性仍然是大型语言模型面临的根本性挑战，因为标准检索机制往往无法捕捉历史状态的时序演变。虽然记忆增强框架提供了结构化替代方案，但现有系统要么依赖闭源模型的静态提示，要么受限于稀疏奖励下的低效训练范式。我们提出MemBuilder——一个通过属性化密集奖励训练模型来协调多维记忆构建的强化学习框架。MemBuilder解决了两个关键挑战：(1) 稀疏轨迹级奖励：我们采用合成会话级问题生成技术，为扩展轨迹提供密集的中间奖励；(2) 多维记忆归因：我们引入贡献感知梯度加权机制，根据各组件对下游任务的影响程度调整策略更新规模。实验结果表明，MemBuilder使一个40亿参数的模型能够超越最先进的闭源基线模型，在长期对话基准测试中展现出强大的泛化能力。

相关内容

属性

关注 2

一个具体事物，总是有许许多多的性质与关系，我们把一个事物的性质与关系，都叫作事物的属性。事物与属性是不可分的，事物都是有属性的事物，属性也都是事物的属性。一个事物与另一个事物的相同或相异，也就是一个事物的属性与另一事物的属性的相同或相异。由于事物属性的相同或相异，客观世界中就形成了许多不同的事物类。具有相同属性的事物就形成一类，具有不同属性的事物就分别地形成不同的类。

管理 LLM 智能体中的演进式记忆：风险、机理及稳定性与安全性受控记忆（SSGM）框架

专知会员服务

16+阅读 · 3月14日

智能体记忆深度剖析：评价指标与系统局限性的分类体系及实证分析

专知会员服务

21+阅读 · 2月26日

《缓解大语言模型（LLMs）幻觉：面向应用的检索增强生成（RAG）、推理与智能体系统综述》

专知会员服务

24+阅读 · 2025年10月29日

【NeurIPS2025】VideoLucy：用于长视频理解的深度记忆回溯机制

专知会员服务

9+阅读 · 2025年10月15日