The hallmark of human intelligence is the self-evolving ability to master new skills by learning from past experiences. However, current AI agents struggle to emulate this self-evolution: fine-tuning is computationally expensive and prone to catastrophic forgetting, while existing memory-based methods rely on passive semantic matching that often retrieves noise. To address these challenges, we propose MemRL, a non-parametric approach that evolves via reinforcement learning on episodic memory. By decoupling stable reasoning from plastic memory, MemRL employs a Two-Phase Retrieval mechanism to filter noise and identify high-utility strategies through environmental feedback. Extensive experiments on HLE, BigCodeBench, ALFWorld, and Lifelong Agent Bench demonstrate that MemRL significantly outperforms state-of-the-art baselines, confirming that MemRL effectively reconciles the stability-plasticity dilemma, enabling continuous runtime improvement without weight updates. Code is available at https://github.com/MemTensor/MemRL.
翻译:人类智能的标志在于能够通过过往经验学习新技能的自进化能力。然而,当前的人工智能智能体难以模拟这种自进化:微调的计算成本高昂且易发生灾难性遗忘,而现有的基于记忆的方法依赖被动的语义匹配,常会检索到噪声信息。为应对这些挑战,我们提出MemRL,一种通过情景记忆上的强化学习实现进化的非参数方法。通过将稳定推理与可塑性记忆解耦,MemRL采用两阶段检索机制过滤噪声,并通过环境反馈识别高效用策略。在HLE、BigCodeBench、ALFWorld和Lifelong Agent Bench上的大量实验表明,MemRL显著优于现有最先进的基线方法,证实其能有效调和稳定性与可塑性之间的矛盾,实现在无需权重更新的情况下持续进行运行时改进。代码发布于https://github.com/MemTensor/MemRL。