Maintaining consistency in long-term dialogues remains a fundamental challenge for LLMs, as standard retrieval mechanisms often fail to capture the temporal evolution of historical states. While memory-augmented frameworks offer a structured alternative, current systems rely on static prompting of closed-source models or suffer from ineffective training paradigms with sparse rewards. We introduce MemBuilder, a reinforcement learning framework that trains models to orchestrate multi-dimensional memory construction with attributed dense rewards. MemBuilder addresses two key challenges: (1) Sparse Trajectory-Level Rewards: we employ synthetic session-level question generation to provide dense intermediate rewards across extended trajectories; and (2) Multi-Dimensional Memory Attribution: we introduce contribution-aware gradient weighting that scales policy updates based on each component's downstream impact. Experimental results show that MemBuilder enables a 4B-parameter model to outperform state-of-the-art closed-source baselines, exhibiting strong generalization across long-term dialogue benchmarks.
翻译:在长期对话中保持一致性仍然是大型语言模型面临的根本性挑战,因为标准检索机制往往无法捕捉历史状态的时序演变。虽然记忆增强框架提供了结构化替代方案,但现有系统要么依赖闭源模型的静态提示,要么受限于稀疏奖励下的低效训练范式。我们提出MemBuilder——一个通过属性化密集奖励训练模型来协调多维记忆构建的强化学习框架。MemBuilder解决了两个关键挑战:(1) 稀疏轨迹级奖励:我们采用合成会话级问题生成技术,为扩展轨迹提供密集的中间奖励;(2) 多维记忆归因:我们引入贡献感知梯度加权机制,根据各组件对下游任务的影响程度调整策略更新规模。实验结果表明,MemBuilder使一个40亿参数的模型能够超越最先进的闭源基线模型,在长期对话基准测试中展现出强大的泛化能力。