Temporal knowledge graph question answering (TKGQA) aims to answer time-sensitive questions by leveraging temporal knowledge bases. While Large Language Models (LLMs) demonstrate significant potential in TKGQA, current prompting strategies constrain their efficacy in two primary ways. First, they are prone to reasoning hallucinations under complex temporal constraints. Second, static prompting limits model autonomy and generalization, as it lack optimization through dynamic interaction with temporal knowledge graphs (TKGs) environments. To address these limitations, we propose \textbf{TKG-Thinker}, a novel agent equipped with autonomous planning and adaptive retrieval capabilities for reasoning over TKGs. Specifically, TKG-Thinker performs in-depth temporal reasoning through dynamic multi-turn interactions with TKGs via a dual-training strategy. We first apply Supervised Fine-Tuning (SFT) with chain-of thought data to instill core planning capabilities, followed by a Reinforcement Learning (RL) stage that leverages multi-dimensional rewards to refine reasoning policies under intricate temporal constraints. Experimental results on benchmark datasets with three open-source LLMs show that TKG-Thinker achieves state-of-the-art performance and exhibits strong generalization across complex TKGQA settings.
翻译:时序知识图谱问答旨在利用时序知识库回答时间敏感性问题。尽管大型语言模型在时序知识图谱问答中展现出巨大潜力,但当前的提示策略主要在两方面限制了其效能。首先,在复杂的时序约束下,模型容易产生推理幻觉。其次,静态提示限制了模型的自主性和泛化能力,因为它缺乏通过与时序知识图谱环境的动态交互进行优化的过程。为应对这些局限,我们提出了\textbf{TKG-Thinker}——一种具备自主规划与自适应检索能力、专为时序知识图谱推理设计的新型智能体。具体而言,TKG-Thinker通过双阶段训练策略,在与时序知识图谱的动态多轮交互中进行深度时序推理。我们首先使用思维链数据进行监督微调,以注入核心规划能力;随后通过强化学习阶段,利用多维奖励在复杂时序约束下优化推理策略。在三个开源大型语言模型及基准数据集上的实验结果表明,TKG-Thinker取得了最先进的性能,并在复杂的时序知识图谱问答场景中展现出强大的泛化能力。