Effective decision-making in the real world depends on memory that is both stable and adaptive: environments change over time, and agents must retain relevant information over long horizons while also updating or overwriting outdated content when circumstances shift. Existing Reinforcement Learning (RL) benchmarks and memory-augmented agents focus primarily on retention, leaving the equally critical ability of memory rewriting largely unexplored. To address this gap, we introduce a benchmark that explicitly tests continual memory updating under partial observability, i.e. the natural setting where an agent must rely on memory rather than current observations, and use it to compare recurrent, transformer-based, and structured memory architectures. Our experiments reveal that classic recurrent models, despite their simplicity, demonstrate greater flexibility and robustness in memory rewriting tasks than modern structured memories, which succeed only under narrow conditions, and transformer-based agents, which often fail beyond trivial retention cases. These findings expose a fundamental limitation of current approaches and emphasize the necessity of memory mechanisms that balance stable retention with adaptive updating. Our work highlights this overlooked challenge, introduces benchmarks to evaluate it, and offers insights for designing future RL agents with explicit and trainable forgetting mechanisms. Code: https://quartz-admirer.github.io/Memory-Rewriting/
翻译:现实世界中的有效决策依赖于既稳定又具适应性的记忆:环境随时间变化,智能体必须在长时程中保留相关信息,同时在情境变化时更新或覆盖过时内容。现有的强化学习基准测试和记忆增强智能体主要关注记忆保持能力,而对同等重要的记忆重写能力则基本未作探索。为填补这一空白,我们提出了一个在部分可观测性下(即智能体必须依赖记忆而非当前观测的自然设定)显式测试持续记忆更新的基准测试,并利用该基准比较了循环网络、基于Transformer的架构及结构化记忆架构。实验表明,经典的循环模型尽管结构简单,却在记忆重写任务中展现出比现代结构化记忆(仅在狭窄条件下成功)和基于Transformer的智能体(在非平凡保持任务之外常告失败)更强的灵活性和鲁棒性。这些发现揭示了现有方法的根本局限性,并强调了平衡稳定保持与适应性更新的记忆机制的必要性。本研究突出了这一被忽视的挑战,引入了评估该能力的基准测试,并为设计具有显式可训练遗忘机制的未来强化学习智能体提供了见解。代码:https://quartz-admirer.github.io/Memory-Rewriting/