The rapid advancements in large language models (LLMs) have ignited interest in the temporal knowledge graph (tKG) domain, where conventional embedding-based and rule-based methods dominate. The question remains open of whether pre-trained LLMs can understand structured temporal relational data and replace them as the foundation model for temporal relational forecasting. Therefore, we bring temporal knowledge forecasting into the generative setting. However, challenges occur in the huge chasms between complex temporal graph data structure and sequential natural expressions LLMs can handle, and between the enormous data sizes of tKGs and heavy computation costs of finetuning LLMs. To address these challenges, we propose a novel retrieval-augmented generation framework named GenTKG combining a temporal logical rule-based retrieval strategy and few-shot parameter-efficient instruction tuning to solve the above challenges, respectively. Extensive experiments have shown that GenTKG outperforms conventional methods of temporal relational forecasting with low computation resources using extremely limited training data as few as 16 samples. GenTKG also highlights remarkable cross-domain generalizability with outperforming performance on unseen datasets without re-training, and in-domain generalizability regardless of time split in the same dataset. Our work reveals the huge potential of LLMs in the tKG domain and opens a new frontier for generative forecasting on tKGs. Code and data are released here: https://github.com/mayhugotong/GenTKG.
翻译:大语言模型(LLM)的快速发展激发了时序知识图谱(tKG)领域的兴趣,而该领域当前仍以基于嵌入和基于规则的传统方法为主导。预训练LLM能否理解结构化时序关系数据并替代现有方法成为时序关系预测的基础模型,这一问题尚待解答。为此,我们将时序知识预测引入生成式设定。然而,挑战在于:复杂时序图数据结构与LLM可处理的序列化自然表达之间存在巨大鸿沟,同时tKG的海量数据规模与微调LLM的高昂计算成本之间也存在显著矛盾。为应对这些挑战,我们提出名为GenTKG的新型检索增强生成框架:通过结合基于时序逻辑规则的检索策略与少量样本的参数高效指令调优,分别解决上述问题。大量实验表明,GenTKG在仅需16个样本的极端有限训练数据下,能以低计算资源超越传统时序关系预测方法。该框架还展现出卓越的跨领域泛化能力——无需重新训练即可在未见数据集上取得更优性能,以及领域内泛化能力——在同一数据集中不受时间划分影响。本工作揭示了LLM在tKG领域的巨大潜力,并为tKG生成式预测开辟了新方向。代码与数据已发布于:https://github.com/mayhugotong/GenTKG。