AI Clones aim to simulate an individual's thoughts and behaviors to enable long-term, personalized interaction, placing stringent demands on memory systems to model experiences, emotions, and opinions over time. Existing memory benchmarks primarily rely on user-agent conversational histories, which are temporally fragmented and insufficient for capturing continuous life trajectories. We introduce CloneMem, a benchmark for evaluating longterm memory in AI Clone scenarios grounded in non-conversational digital traces, including diaries, social media posts, and emails, spanning one to three years. CloneMem adopts a hierarchical data construction framework to ensure longitudinal coherence and defines tasks that assess an agent's ability to track evolving personal states. Experiments show that current memory mechanisms struggle in this setting, highlighting open challenges for life-grounded personalized AI. Code and dataset are available at https://github.com/AvatarMemory/CloneMemBench
翻译:AI克隆体旨在模拟个体的思维与行为模式,以实现长期个性化的交互,这对记忆系统提出了严格要求,需要其能够对随时间推移产生的经历、情感与观点进行建模。现有记忆基准主要依赖于用户与智能体之间的对话历史记录,这些记录在时间上是碎片化的,不足以捕捉连续的生命轨迹。本文提出CloneMem,这是一个基于非对话式数字痕迹(包括日记、社交媒体帖子和电子邮件,时间跨度为一至三年)来评估AI克隆场景下长期记忆能力的基准。CloneMem采用分层数据构建框架以确保纵向连贯性,并定义了用于评估智能体追踪个人状态演变能力的任务。实验表明,当前的记忆机制在此设定下表现欠佳,凸显了基于生命历程的个性化AI所面临的开放挑战。代码与数据集可在 https://github.com/AvatarMemory/CloneMemBench 获取。