Large Language Models (LLMs) still suffer from severe hallucinations and catastrophic forgetting during causal reasoning over massive, fragmented long contexts. Existing memory mechanisms typically treat retrieval as a static, single-step passive matching process, leading to severe semantic dilution and contextual fragmentation. To overcome these fundamental bottlenecks, we propose MemCoT, a test-time memory scaling framework that redefines the reasoning process by transforming long-context reasoning into an iterative, stateful information search. MemCoT introduces a multi-view long-term memory perception module that enables Zoom-In evidence localization and Zoom-Out contextual expansion, allowing the model to first identify where relevant evidence resides and then reconstruct the surrounding causal structure necessary for reasoning. In addition, MemCoT employs a task-conditioned dual short-term memory system composed of semantic state memory and episodic trajectory memory. This short-term memory records historical search decisions and dynamically guides query decomposition and pruning across iterations. Empirical evaluations demonstrate that MemCoT establishes a state-of-the-art performance. Empowered by MemCoT, several open- and closed-source models achieve SOTA performance on the LoCoMo benchmark and LongMemEval-S benchmark.
翻译:大型语言模型(LLMs)在处理大规模、碎片化的长上下文因果推理时,仍存在严重的幻觉和灾难性遗忘问题。现有记忆机制通常将检索视为静态、单步的被动匹配过程,导致严重的语义稀释和上下文碎片化。为克服这些根本性瓶颈,我们提出MemCoT——一个测试时记忆扩展框架,通过将长上下文推理转化为迭代、有状态的信息搜索来重新定义推理过程。MemCoT引入多视角长时记忆感知模块,支持"放大"式证据定位与"缩小"式上下文扩展,使模型首先定位相关证据所在位置,再重建推理所需的因果结构。此外,MemCoT采用由语义状态记忆与情景轨迹记忆组成的任务条件化双短时记忆系统,该系统记录历史搜索决策,并动态指导跨迭代的查询分解与剪枝。实证评估表明,MemCoT实现了最先进性能。在MemCoT支持下,多个开源与闭源模型在LoCoMo基准测试和LongMemEval-S基准测试中均达到了SOTA性能。