Knowledge graphs (KGs) are crucial in the field of artificial intelligence and are widely applied in downstream tasks, such as enhancing Question Answering (QA) systems. The construction of KGs typically requires significant effort from domain experts. Recently, Large Language Models (LLMs) have been used for knowledge graph construction (KGC), however, most existing approaches focus on a local perspective, extracting knowledge triplets from individual sentences or documents. In this work, we introduce Graphusion, a zero-shot KGC framework from free text. The core fusion module provides a global view of triplets, incorporating entity merging, conflict resolution, and novel triplet discovery. We showcase how Graphusion could be applied to the natural language processing (NLP) domain and validate it in the educational scenario. Specifically, we introduce TutorQA, a new expert-verified benchmark for graph reasoning and QA, comprising six tasks and a total of 1,200 QA pairs. Our evaluation demonstrates that Graphusion surpasses supervised baselines by up to 10% in accuracy on link prediction. Additionally, it achieves average scores of 2.92 and 2.37 out of 3 in human evaluations for concept entity extraction and relation recognition, respectively.
翻译:知识图谱(KGs)在人工智能领域至关重要,并广泛应用于下游任务,例如增强问答(QA)系统。知识图谱的构建通常需要领域专家投入大量精力。最近,大型语言模型(LLMs)已被用于知识图谱构建(KGC),然而,现有方法大多局限于局部视角,从单个句子或文档中提取知识三元组。在本工作中,我们提出了Graphusion,一个从自由文本进行零样本知识图谱构建的框架。其核心融合模块提供了三元组的全局视图,包含实体融合、冲突解决和新颖三元组发现。我们展示了Graphusion如何应用于自然语言处理(NLP)领域,并在教育场景中进行了验证。具体而言,我们引入了TutorQA,一个用于图谱推理和问答的、经过专家验证的新基准,包含六项任务,共计1,200个问答对。我们的评估表明,Graphusion在链接预测任务上的准确率比有监督基线模型高出最多10%。此外,在概念实体提取和关系识别的人类评估中,其平均得分分别为2.92分和2.37分(满分3分)。