The rapid advancements in large language models (LLMs) have ignited interest in the temporal knowledge graph (tKG) domain, where conventional embedding-based and rule-based methods dominate. The question remains open of whether pre-trained LLMs can understand structured temporal relational data and replace them as the foundation model for temporal relational forecasting. Therefore, we bring temporal knowledge forecasting into the generative setting. However, challenges occur in the huge chasms between complex temporal graph data structure and sequential natural expressions LLMs can handle, and between the enormous data sizes of tKGs and heavy computation costs of finetuning LLMs. To address these challenges, we propose a novel retrieval-augmented generation framework named GenTKG combining a temporal logical rule-based retrieval strategy and few-shot parameter-efficient instruction tuning to solve the above challenges, respectively. Extensive experiments have shown that GenTKG outperforms conventional methods of temporal relational forecasting with low computation resources using extremely limited training data as few as 16 samples. GenTKG also highlights remarkable cross-domain generalizability with outperforming performance on unseen datasets without re-training, and in-domain generalizability regardless of time split in the same dataset. Our work reveals the huge potential of LLMs in the tKG domain and opens a new frontier for generative forecasting on tKGs.
翻译:大规模语言模型(LLMs)的快速发展引发了时序知识图谱(tKG)领域的关注。该领域中,基于嵌入和基于规则的常规方法占据主导地位。预训练LLMs能否理解结构化时序关系数据,并替代其作为时序关系预测的基础模型,这一问题仍有待探索。为此,我们将时序知识预测引入生成式场景。然而,挑战在于:复杂时序图数据结构与LLMs可处理的序列化自然表达之间存在巨大鸿沟,同时tKG的庞大数据规模与微调LLMs的高昂计算成本之间存在矛盾。针对这些挑战,我们提出了一种新型检索增强生成框架GenTKG,该框架分别结合基于时序逻辑规则的检索策略和少样本参数高效指令微调,以解决上述难题。大量实验表明,GenTKG在仅使用16个样本的极有限训练数据、且计算资源需求较低的条件下,性能优于常规时序关系预测方法。GenTKG还展现出卓越的跨领域泛化能力,可在无需重新训练的情况下超越未见数据集上的表现;同时在同一数据集内,无论时间划分方式如何,均保持领域内泛化性。本研究揭示了LLMs在tKG领域的巨大潜力,并为tKG生成式预测开辟了新前沿。