This paper presents an exhaustive quantitative and qualitative evaluation of Large Language Models (LLMs) for Knowledge Graph (KG) construction and reasoning. We employ eight distinct datasets that encompass aspects including entity, relation and event extraction, link prediction, and question answering. Empirically, our findings suggest that GPT-4 outperforms ChatGPT in the majority of tasks and even surpasses fine-tuned models in certain reasoning and question-answering datasets. Moreover, our investigation extends to the potential generalization ability of LLMs for information extraction, which culminates in the presentation of the Virtual Knowledge Extraction task and the development of the VINE dataset. Drawing on these empirical findings, we further propose AutoKG, a multi-agent-based approach employing LLMs for KG construction and reasoning, which aims to chart the future of this field and offer exciting opportunities for advancement. We anticipate that our research can provide invaluable insights for future undertakings of KG\footnote{Code and datasets will be available in https://github.com/zjunlp/AutoKG.
翻译:本文对大型语言模型在知识图谱构建与推理中的性能进行了详尽的定量与定性评估。我们采用了八个不同数据集,涵盖实体、关系与事件抽取、链接预测及问答等任务。实证结果表明,GPT-4在大多数任务上表现优于ChatGPT,甚至在部分推理与问答数据集上超越了微调模型。此外,我们进一步探究了LLM在信息抽取中的潜在泛化能力,最终提出虚拟知识抽取任务并构建了VINE数据集。基于这些实证发现,我们提出了AutoKG——一种采用多智能体架构的LLM知识图谱构建与推理方法,旨在为该领域绘制未来蓝图并提供令人振奋的发展机遇。我们期望本研究能够为未来知识图谱的相关工作提供宝贵见解。