The rapid advancements in large language models (LLMs) have ignited interest in the temporal knowledge graph (tKG) domain, where conventional carefully designed embedding-based and rule-based models dominate. The question remains open of whether pre-trained LLMs can understand structured temporal relational data and replace them as the foundation model for temporal relational forecasting. Therefore, we bring temporal knowledge forecasting into the generative setting. However, challenges occur in the huge chasms between complex temporal graph data structure and sequential natural expressions LLMs can handle, and between the enormous data sizes of tKGs and heavy computation costs of finetuning LLMs. To address these challenges, we propose a novel retrieval augmented generation framework that performs generative forecasting on tKGs named GenTKG, which combines a temporal logical rule-based retrieval strategy and lightweight parameter-efficient instruction tuning. Extensive experiments have shown that GenTKG outperforms conventional methods of temporal relational forecasting under low computation resources. GenTKG also highlights remarkable transferability with exceeding performance on unseen datasets without re-training. Our work reveals the huge potential of LLMs in the tKG domain and opens a new frontier for generative forecasting on tKGs.
翻译:摘要:大规模语言模型(LLM)的快速发展激发了时序知识图谱(tKG)领域的兴趣,而该领域传统上由精心设计的基于嵌入和基于规则的方法主导。预训练LLM能否理解结构化时序关系数据并取代其作为时序关系预测的基础模型,仍是一个开放问题。因此,我们将时序知识预测引入生成式场景。然而,其面临两大挑战:一是复杂时序图数据结构与LLM能够处理的序列化自然表达之间存在巨大鸿沟;二是tKG的海量数据规模与微调LLM的高昂计算成本之间的矛盾。为应对这些挑战,我们提出了一种名为GenTKG的新型检索增强生成框架,该框架在tKG上执行生成式预测,结合了基于时序逻辑规则的检索策略与轻量级参数高效的指令微调。大量实验表明,GenTKG在低计算资源条件下优于传统的时序关系预测方法。此外,GenTKG在未见数据集上无需重新训练即可实现卓越性能,展现出显著的迁移能力。本工作揭示了LLM在tKG领域的巨大潜力,并为tKG上的生成式预测开辟了新前沿。