Memory emerges as the core module in the large language model (LLM)-based agents for long-horizon complex tasks (e.g., multi-turn dialogue, game playing, scientific discovery), where memory can enable knowledge accumulation, iterative reasoning and self-evolution. A number of memory methods have been proposed in the literature. However, these methods have not been systematically and comprehensively compared under the same experimental settings. In this paper, we first summarize a unified framework that incorporates all the existing agent memory methods from a high-level perspective. We then extensively compare representative agent memory methods on two well-known benchmarks and examine the effectiveness of all methods, providing a thorough analysis of those methods. As a byproduct of our experimental analysis, we also design a new memory method by exploiting modules in the existing methods, which outperforms the state-of-the-art methods. Finally, based on these findings, we offer promising future research opportunities. We believe that a deeper understanding of the behavior of existing methods can provide valuable new insights for future research.
翻译:记忆作为基于大语言模型(LLM)的智能体在长期复杂任务(如多轮对话、游戏博弈、科学发现)中的核心模块,能够实现知识积累、迭代推理与自我进化。文献中已提出多种记忆方法,但这些方法尚未在相同实验设置下进行系统全面的比较。本文首先从高层视角构建统一框架,整合现有所有智能体记忆方法;随后在两个公认基准数据集上广泛比较代表性智能体记忆方法,检验各类方法的有效性并进行深入分析。作为实验分析的副产品,我们利用现有方法的模块设计新记忆方法,其性能超越当前最优方法。最后,基于这些发现,我们提出具有前景的未来研究方向。我们相信,深入理解现有方法的行为特征可为后续研究提供宝贵的新见解。