Temporal Knowledge Graph Question Answering (TKGQA) is inherently challenging, as it requires sophisticated reasoning over dynamic facts with multi-hop dependencies and complex temporal constraints. Existing methods rely on fixed workflows and expensive closed-source APIs, limiting flexibility and scalability. We propose Temp-R1, the first autonomous end-to-end agent for TKGQA trained through reinforcement learning. To address cognitive overload in single-action reasoning, we expand the action space with specialized internal actions alongside external action. To prevent shortcut learning on simple questions, we introduce reverse curriculum learning that trains on difficult questions first, forcing the development of sophisticated reasoning before transferring to easier cases. Our 8B-parameter Temp-R1 achieves state-of-the-art performance on MultiTQ and TimelineKGQA, improving 19.8% over strong baselines on complex questions. Our work establishes a new paradigm for autonomous temporal reasoning agents. Our code will be publicly available soon at https://github.com/zjukg/Temp-R1.
翻译:时序知识图谱问答(TKGQA)本质上具有挑战性,因为它需要对具有多跳依赖关系和复杂时序约束的动态事实进行复杂推理。现有方法依赖于固定工作流程和昂贵的闭源API,限制了灵活性和可扩展性。我们提出了Temp-R1,这是首个通过强化学习训练的、用于TKGQA的端到端自主智能体。为解决单步推理中的认知过载问题,我们在外部动作之外扩展了动作空间,引入了专门的内部动作。为防止在简单问题上出现捷径学习,我们引入了逆向课程学习,即先训练困难问题,迫使智能体在迁移到简单案例前先发展出复杂的推理能力。我们拥有80亿参数的Temp-R1在MultiTQ和TimelineKGQA数据集上取得了最先进的性能,在复杂问题上比强基线模型提升了19.8%。我们的工作为自主时序推理智能体确立了新范式。我们的代码将很快在 https://github.com/zjukg/Temp-R1 公开。