Large Language Models (LLMs) have achieved impressive reasoning abilities, but struggle with temporal understanding, especially when questions involve multiple entities, compound operators, and evolving event sequences. Temporal Knowledge Graphs (TKGs), which capture vast amounts of temporal facts in a structured format, offer a reliable source for temporal reasoning. However, existing TKG-based LLM reasoning methods still struggle with four major challenges: maintaining temporal faithfulness in multi-hop reasoning, achieving multi-entity temporal synchronization, adapting retrieval to diverse temporal operators, and reusing prior reasoning experience for stability and efficiency. To address these issues, we propose MemoTime, a memory-augmented temporal knowledge graph framework that enhances LLM reasoning through structured grounding, recursive reasoning, and continual experience learning. MemoTime decomposes complex temporal questions into a hierarchical Tree of Time, enabling operator-aware reasoning that enforces monotonic timestamps and co-constrains multiple entities under unified temporal bounds. A dynamic evidence retrieval layer adaptively selects operator-specific retrieval strategies, while a self-evolving experience memory stores verified reasoning traces, toolkit decisions, and sub-question embeddings for cross-type reuse. Comprehensive experiments on multiple temporal QA benchmarks show that MemoTime achieves overall state-of-the-art results, outperforming the strong baseline by up to 24.0%. Furthermore, MemoTime enables smaller models (e.g., Qwen3-4B) to achieve reasoning performance comparable to that of GPT-4-Turbo.
翻译:大语言模型(LLM)已展现出卓越的推理能力,但在时序理解方面仍存在不足,尤其当问题涉及多实体、复合运算符及动态演化的事件序列时。时序知识图谱(TKG)以结构化形式捕获海量时序事实,为时序推理提供了可靠的信息源。然而,现有基于TKG的LLM推理方法仍面临四大挑战:在多跳推理中保持时序忠实性、实现多实体的时序同步、使检索适应多样化的时序运算符,以及复用先前的推理经验以提升稳定性与效率。为应对这些问题,我们提出了MemoTime——一个记忆增强的时序知识图谱框架,通过结构化基础、递归推理与持续经验学习来增强LLM的推理能力。MemoTime将复杂的时序问题分解为层次化的“时序树”,实现运算符感知的推理,强制时间戳单调性,并在统一的时序约束下对多实体进行协同约束。动态证据检索层自适应地选择面向特定运算符的检索策略,而自演化的经验记忆则存储已验证的推理轨迹、工具调用决策及子问题嵌入,以支持跨类型复用。在多个时序问答基准上的综合实验表明,MemoTime取得了全面的最优性能,较强劲基线最高提升24.0%。此外,MemoTime使较小模型(如Qwen3-4B)能够达到与GPT-4-Turbo相当的推理性能。