In recent years, knowledge graphs (KGs) - in particular in the form of labeled property graphs (LPGs) - have become essential components in a broad range of applications. Although the absence of strict schemas for KGs facilitates structural issues that lead to redundancies and subsequently to inconsistencies and anomalies, the problem of KG quality has so far received only little attention. Inspired by normalization using functional dependencies for relational data, a first approach exploiting dependencies within nodes has been proposed. However, real-world KGs also expose functional dependencies involving edges. In this paper, we therefore propose graph-native normalization, which considers dependencies within nodes, edges, and their combination. We define a range of graph-native normal forms and graph object functional dependencies and propose algorithms for transforming graphs accordingly. We evaluate our contributions using a broad range of synthetic and native graph datasets.
翻译:近年来,知识图谱(KGs)——特别是以标记属性图(LPGs)形式存在的知识图谱——已成为广泛应用中的关键组成部分。尽管知识图谱缺乏严格模式这一特点便于处理结构性问题,但这也导致了冗余,进而引发不一致和异常现象。然而,知识图谱的质量问题至今仍未受到足够重视。受关系数据中基于函数依赖的规范化方法启发,已有研究提出了利用节点内部依赖关系的初步方法。然而,现实世界中的知识图谱同样展现出涉及边的函数依赖关系。为此,本文提出图原生规范化方法,该方法综合考虑节点内部、边内部及其组合的依赖关系。我们定义了一系列图原生范式与图对象函数依赖,并提出了相应的图转换算法。我们通过大量合成及原生图数据集对所提出的方法进行了评估。