Long-term memory is the missing layer for LLM agents: across sessions they forget, and the common workaround -- replaying the whole history into the prompt -- is expensive, slow, and, as distractors accumulate, less accurate. Most memory systems win on cost or latency but still lose to the full-context baseline on accuracy, and benchmark numbers are reported on inconsistent, non-reproducible harnesses, so one system appears at wildly different scores across sources. We present Engram, an open-source, dual-process memory engine on a bi-temporal data model. A fast write path appends lossless episodes with no LLM on the critical path; an asynchronous path extracts atomic (subject, predicate, object) facts, builds a bi-temporal knowledge graph, and resolves contradictions without an LLM call per fact -- invalidating, never deleting, so every fact keeps provenance and a supersession chain. A hybrid read path fuses dense, lexical, graph, and recency/salience signals, applies a point-in-time ("as-of") filter, and assembles a compact, provenance-tagged context. On the full 500-question LongMemEval_S, graded by the official category-specific judge, Engram's lean configuration -- answering from a ~9.6k-token retrieved slice, never the full history -- scores 83.6% vs. 73.2% for full-context (+10.4 points, McNemar p < 10^-6) at ~8x fewer tokens (9.6k vs. 79k), with 0/500 errored. The gain needs a hybrid read path: facts alone lose recall, while facts plus retrieved chunks recover detail. We also contribute a neutral, in-repo evaluation harness with the official judge baked in and the full-context baseline in every table, publish the raw per-question logs, and document the measurement-integrity pitfalls (truncation, home-grown judges, full-history leaks) that silently distort memory benchmarks. Every number ships with a command to reproduce it.
翻译:长期记忆是LLM智能体缺失的关键层:跨会话时它们会遗忘,而常见的变通方案——将完整历史重放至提示词中——不仅成本高昂、速度缓慢,且随着干扰项积累,准确性下降。大多数记忆系统在成本或延迟上占优,但准确性仍不及完整上下文基线,且基准测试结果来自不一致、不可复现的测试框架,导致同一系统在不同来源中呈现差异巨大的分数。本文提出Engram,一种基于双时序数据模型的开源双进程记忆引擎。快速写入路径在关键路径上无需LLM即可追加无损事件;异步路径提取原子化(主体、谓语、客体)事实,构建双时序知识图谱,并在无需为每个事实调用LLM的情况下解决矛盾——采用失效而非删除机制,使每个事实保留溯源和替代链。混合读取路径融合稠密、词汇、图结构以及时效/显著性信号,应用时间点("as-of")过滤器,组装紧凑且携带溯源标签的上下文。在包含500个问题的完整LongMemEval_S数据集上(由官方类别特定评估器评分),Engram的轻量配置——基于约9.6k token的检索片段(非完整历史)作答——取得83.6%的分数,而完整上下文基线为73.2%(+10.4个百分点,McNemar检验p<10^-6),同时token数量减少约8倍(9.6k vs 79k),且500个问题中零错误。性能提升依赖于混合读取路径:仅依赖事实会丢失召回率,而事实加检索片段可恢复细节。我们还贡献了一个中立的仓库内评估框架(内置官方评估器且在每张表格中附带完整上下文基线),公开每个问题的原始日志,并记录了静默扭曲记忆基准的测量完整性陷阱(截断、自制评估器、完整历史泄露)。每个分数均附有可复现命令。