With knowledge graphs (KGs) at the center of numerous applications such as recommender systems and question answering, the need for generalized pipelines to construct and continuously update such KGs is increasing. While the individual steps that are necessary to create KGs from unstructured (e.g. text) and structured data sources (e.g. databases) are mostly well-researched for their one-shot execution, their adoption for incremental KG updates and the interplay of the individual steps have hardly been investigated in a systematic manner so far. In this work, we first discuss the main graph models for KGs and introduce the major requirement for future KG construction pipelines. Next, we provide an overview of the necessary steps to build high-quality KGs, including cross-cutting topics such as metadata management, ontology development, and quality assurance. We then evaluate the state of the art of KG construction w.r.t the introduced requirements for specific popular KGs as well as some recent tools and strategies for KG construction. Finally, we identify areas in need of further research and improvement.
翻译:知识图谱作为推荐系统、问答等众多应用的核心,对其构建及持续更新的通用化管线的需求日益增长。尽管从非结构化数据(如文本)和结构化数据源(如数据库)创建知识图谱所需的单一步骤在一次性执行方面已得到充分研究,但这些步骤在增量知识图谱更新中的实施及彼此间的协同作用,迄今尚未得到系统性探究。本文首先讨论了知识图谱的主要图模型,并提出了未来知识图谱构建管线的核心需求。随后,我们概述了构建高质量知识图谱的必要步骤,涵盖元数据管理、本体开发、质量保障等横切主题。接着,针对特定主流知识图谱及近期知识图谱构建工具与策略,我们评估了其满足上述需求的现状。最后,我们指出了有待进一步研究与改进的方向。