LLM-driven agents demonstrate strong performance in sequential decision-making but often rely on on-the-fly reasoning, re-deriving solutions even in recurring scenarios. This insufficient experience reuse leads to computational redundancy and execution instability. To bridge this gap, we propose ProcMEM, a framework that enables agents to autonomously learn procedural memory from interaction experiences without parameter updates. By formalizing a Skill-MDP, ProcMEM transforms passive episodic narratives into executable Skills defined by activation, execution, and termination conditions to ensure executability. To achieve reliable reusability without capability degradation, we introduce Non-Parametric PPO, which leverages semantic gradients for high-quality candidate generation and a PPO Gate for robust Skill verification. Through score-based maintenance, ProcMEM sustains compact, high-quality procedural memory. Experimental results across in-domain, cross-task, and cross-agent scenarios demonstrate that ProcMEM achieves superior reuse rates and significant performance gains with extreme memory compression. Visualized evolutionary trajectories and Skill distributions further reveal how ProcMEM transparently accumulates, refines, and reuses procedural knowledge to facilitate long-term autonomy.
翻译:LLM驱动的智能体在序列决策任务中展现出强大性能,但通常依赖即时推理,即使在重复场景中也会重新推导解决方案。这种经验复用不足导致计算冗余与执行不稳定性。为弥补这一差距,我们提出ProcMEM框架,使智能体能够从交互经验中自主习得程序记忆而无需参数更新。通过形式化Skill-MDP,ProcMEM将被动的事件叙述转化为可执行的技能——这些技能由激活条件、执行条件和终止条件定义以确保可执行性。为实现可靠复用且避免能力退化,我们引入非参数化PPO方法,该方法利用语义梯度生成高质量候选技能,并通过PPO门控机制进行鲁棒的技能验证。基于分数的记忆维护机制使ProcMEM能够保持紧凑且高质量的程序记忆。在领域内、跨任务和跨智能体场景的实验结果表明,ProcMEM在实现极端记忆压缩的同时,获得了卓越的复用率与显著的性能提升。可视化的演化轨迹与技能分布进一步揭示了ProcMEM如何透明地积累、精炼并复用程序知识,从而促进长期自主性。