MemMachine: A Ground-Truth-Preserving Memory System for Personalized AI Agents

Large Language Model (LLM) agents require persistent memory to maintain personalization, factual continuity, and long-horizon reasoning, yet standard context-window and retrieval-augmented generation (RAG) pipelines degrade over multi-session interactions. We present MemMachine, an open-source memory system that integrates short-term, long-term episodic, and profile memory within a ground-truth-preserving architecture that stores entire conversational episodes and reduces lossy LLM-based extraction. MemMachine uses contextualized retrieval that expands nucleus matches with surrounding context, improving recall when relevant evidence spans multiple dialogue turns. Across benchmarks, MemMachine achieves strong accuracy-efficiency tradeoffs: on LoCoMo it reaches 0.9169 using gpt4.1-mini; on LongMemEvalS (ICLR 2025), a six-dimension ablation yields 93.0 percent accuracy, with retrieval-stage optimizations -- retrieval depth tuning (+4.2 percent), context formatting (+2.0 percent), search prompt design (+1.8 percent), and query bias correction (+1.4 percent) -- outperforming ingestion-stage gains such as sentence chunking (+0.8 percent). GPT-5-mini exceeds GPT-5 by 2.6 percent when paired with optimized prompts, making it the most cost-efficient setup. Compared to Mem0, MemMachine uses roughly 80 percent fewer input tokens under matched conditions. A companion Retrieval Agent adaptively routes queries among direct retrieval, parallel decomposition, or iterative chain-of-query strategies, achieving 93.2 percent on HotpotQA-hard and 92.6 percent on WikiMultiHop under randomized-noise conditions. These results show that preserving episodic ground truth while layering adaptive retrieval yields robust, efficient long-term memory for personalized LLM agents.

翻译：大语言模型（LLM）智能体需要持久化记忆来维持个性化、事实连续性和长程推理能力，但标准上下文窗口与检索增强生成（RAG）流程在多轮交互中会逐渐退化。我们提出开源记忆系统MemMachine，其架构在保留真实性的前提下整合短期记忆、长期情景记忆与配置文件记忆，完整存储对话片段并减少基于LLM的有损提取。MemMachine采用上下文感知检索技术，通过扩展核心匹配结果的周边语境，提升相关证据跨多轮对话时的召回率。在基准测试中，MemMachine实现了精度与效率的强劲平衡：基于gpt4.1-mini在LoCoMo上达到0.9169；在ICLR 2025的LongMemEvalS上通过六维消融实验取得93.0%精度，其中检索阶段优化（检索深度调优+4.2%、上下文格式化+2.0%、搜索提示设计+1.8%、查询偏差校正+1.4%）显著优于摄入阶段增益（如分句处理+0.8%）。GPT-5-mini配合优化提示后性能超越GPT-5达2.6%，成为最高性价比配置。与Mem0相比，MemMachine在同等条件下约减少80%输入令牌。配套的检索智能体可自适应地在直接检索、并行分解或迭代链式查询策略间路由查询，在随机噪声条件下于HotpotQA-hard和WikiMultiHop上分别达到93.2%和92.6%。这些结果表明，保留情景真值并叠加自适应检索策略，能为个性化LLM智能体提供鲁棒高效的长时记忆。