Knowledge Graphs (KGs) are a major asset for companies thanks to their great flexibility in data representation and their numerous applications, e.g., vocabulary sharing, Q/A or recommendation systems. To build a KG it is a common practice to rely on automatic methods for extracting knowledge from various heterogeneous sources. But in a noisy and uncertain world, knowledge may not be reliable and conflicts between data sources may occur. Integrating unreliable data would directly impact the use of the KG, therefore such conflicts must be resolved. This could be done manually by selecting the best data to integrate. This first approach is highly accurate, but costly and time-consuming. That is why recent efforts focus on automatic approaches, which represents a challenging task since it requires handling the uncertainty of extracted knowledge throughout its integration into the KG. We survey state-of-the-art approaches in this direction and present constructions of both open and enterprise KGs and how their quality is maintained. We then describe different knowledge extraction methods, introducing additional uncertainty. We also discuss downstream tasks after knowledge acquisition, including KG completion using embedding models, knowledge alignment, and knowledge fusion in order to address the problem of knowledge uncertainty in KG construction. We conclude with a discussion on the remaining challenges and perspectives when constructing a KG taking into account uncertainty.
翻译:知识图谱(KGs)因其数据表示的极大灵活性及在词汇共享、问答系统或推荐系统等众多应用中的价值,成为企业的重要资产。构建知识图谱时,通常依赖于从各种异构数据源中自动提取知识的方法。然而,在充满噪声和不确定性的世界中,知识可能并不可靠,且数据源之间可能出现冲突。集成不可靠数据将直接影响知识图谱的使用,因此必须解决此类冲突。这可以通过人工选择最佳数据进行集成来实现。第一种方法精度很高,但成本高昂且耗时。因此,近期研究聚焦于自动化方法,这是一项具有挑战性的任务,因为它需要在知识集成到知识图谱的整个过程中处理所提取知识的不确定性。本文综述了该方向的最新方法,介绍了开放知识图谱和企业知识图谱的构建及其质量维护方式。随后,我们描述了不同的知识提取方法,这些方法会引入额外的不确定性。我们还讨论了知识获取后的下游任务,包括使用嵌入模型进行知识图谱补全、知识对齐以及知识融合,以应对知识图谱构建中的知识不确定性问题。最后,我们讨论了在考虑不确定性的情况下构建知识图谱时仍面临的挑战与未来展望。