Automatic knowledge graph construction aims to manufacture structured human knowledge. To this end, much effort has historically been spent extracting informative fact patterns from different data sources. However, more recently, research interest has shifted to acquiring conceptualized structured knowledge beyond informative data. In addition, researchers have also been exploring new ways of handling sophisticated construction tasks in diversified scenarios. Thus, there is a demand for a systematic review of paradigms to organize knowledge structures beyond data-level mentions. To meet this demand, we comprehensively survey more than 300 methods to summarize the latest developments in knowledge graph construction. A knowledge graph is built in three steps: knowledge acquisition, knowledge refinement, and knowledge evolution. The processes of knowledge acquisition are reviewed in detail, including obtaining entities with fine-grained types and their conceptual linkages to knowledge graphs; resolving coreferences; and extracting entity relationships in complex scenarios. The survey covers models for knowledge refinement, including knowledge graph completion, and knowledge fusion. Methods to handle knowledge evolution are also systematically presented, including condition knowledge acquisition, condition knowledge graph completion, and knowledge dynamic. We present the paradigms to compare the distinction among these methods along the axis of the data environment, motivation, and architecture. Additionally, we also provide briefs on accessible resources that can help readers to develop practical knowledge graph systems. The survey concludes with discussions on the challenges and possible directions for future exploration.
翻译:自动知识图谱构建旨在生成结构化的领域知识。为此,长期以来研究人员致力于从不同数据源中提取信息性事实模式。然而近年来,研究重点已转向获取超越信息性数据的、概念化的结构化知识。同时,研究者们也在探索多样化场景中处理复杂构建任务的新方法。因此,亟需对组织超越数据层面提及的知识结构范式进行系统梳理。为满足这一需求,我们全面调查了300余种方法,系统总结了知识图谱构建的最新进展。知识图谱的构建包含三个步骤:知识获取、知识精炼和知识演化。本文详尽梳理了知识获取过程,包括获取具有细粒度实体的类型及其与知识图谱的概念关联、解决共指消解问题、以及复杂场景下实体关系的抽取。综述涵盖了知识精炼模型,包括知识图谱补全与知识融合。同时系统呈现了处理知识演化的方法,包括条件知识获取、条件知识图谱补全与知识动态机制。我们从数据环境、动机和架构三个维度出发,提出了用于比较各类方法差异的范式体系。此外,我们还简要介绍了可访问的资源,以帮助读者开发实用的知识图谱系统。最后,本文就未来探索面临的挑战和可能方向进行了讨论。