Episodic memory is a central component of human memory, which refers to the ability to recall coherent events grounded in who, when, and where. However, most agent memory systems only emphasize semantic recall and treat experience as structures such as key-value, vector, or graph, which makes them struggle to represent and retrieve coherent events. To address this challenge, we propose a Character-and-Scene based memory architecture(CAST) inspired by dramatic theory. Specifically, CAST constructs 3D scenes (time/place/topic) and organizes them into character profiles that summarize the events of a character to represent episodic memory. Moreover, CAST complements this episodic memory with a graph-based semantic memory, which yields a robust dual memory design. Experiments demonstrate that CAST has averagely improved 8.11% F1 and 10.21% J(LLM-as-a-Judge) than baselines on various datasets, especially on open and time-sensitive conversational questions.
翻译:情景记忆是人类记忆的核心组成部分,指基于何人、何时、何地回忆连贯事件的能力。然而,现有智能体记忆系统大多仅强调语义回忆,并将经验处理为键值对、向量或图等结构,导致其难以表征与检索连贯事件。为应对这一挑战,我们受戏剧理论启发,提出一种基于角色与场景的记忆架构(CAST)。具体而言,CAST构建三维场景(时间/地点/主题)并将其组织为角色画像,通过汇总角色相关事件来表征情景记忆。此外,CAST通过基于图的语义记忆对此情景记忆进行补充,形成鲁棒的双记忆架构。实验表明,在多种数据集上,CAST相较于基线模型在F1分数与J(LLM-as-a-Judge)指标上平均分别提升8.11%与10.21%,尤其在开放性与时间敏感性对话问题上表现突出。