Recent standardization efforts for graph databases lead to standard query languages like GQL and SQL/PGQ, and constraint languages like Property Graph Constraints (PG-Constraints). In this paper, we embark on the study of repairing property graphs under PG-Constraints. We identify a significant subset of PG-Constraints, encoding denial constraints and including recursion as a key feature, while still permitting automata-based structural analyses of errors. We present a comprehensive repair pipeline for these constraints to repair Property Graphs, involving changes in the graph topology and leading to node, edge and, optionally, label deletions. We investigate three algorithmic strategies for the repair procedure, based on Integer Linear Programming (ILP), a naive, and an LP-guided greedy algorithm. Our experiments on various real-world datasets reveal that repairing with label deletions can achieve a 59% reduction in deletions compared to node/edge deletions. Moreover, the LP-guided greedy algorithm offers a runtime advantage of up to 97% compared to the ILP strategy, while matching the same quality.
翻译:近年来图数据库的标准化工作催生了GQL和SQL/PGQ等标准查询语言,以及属性图约束(PG-Constraints)等约束语言。本文致力于研究在PG-Constraints下修复属性图的问题。我们识别出PG-Constraints的一个重要子集,该子集编码了否定约束并以递归为核心特征,同时仍支持基于自动机的结构错误分析。针对这些约束,我们提出了一个完整的属性图修复流程,涉及图拓扑结构的变更,并导致节点、边以及可选的标签删除。我们研究了基于整数线性规划(ILP)、朴素算法和LP引导贪心算法的三种修复策略。在多个真实数据集上的实验表明:与仅删除节点/边相比,允许标签删除的修复策略可减少59%的删除量;此外,LP引导贪心算法在保持相同修复质量的前提下,相比ILP策略可获得高达97%的运行时间优势。