With the increasing use of graph-structured data, there is also increasing interest in investigating graph data dependencies and their applications, e.g., in graph data profiling. Graph Generating Dependencies (GGDs) are a class of dependencies for property graphs that can express the relation between different graph patterns and constraints based on their attribute similarities. Rich syntax and semantics of GGDs make them a good candidate for graph data profiling. Nonetheless, GGDs are difficult to define manually, especially when there are no data experts available. In this paper, we propose GGDMiner, a framework for discovering approximate GGDs from graph data automatically, with the intention of profiling graph data through GGDs for the user. GGDMiner has three main steps: (1) pre-processing, (2) candidate generation, and, (3) GGD extraction. To optimize memory consumption and execution time, GGDMiner uses a factorized representation of each discovered graph pattern, called Answer Graph. Our results show that the discovered set of GGDs can give an overview about the input graph, both schema level information and also correlations between the graph patterns and attributes.
翻译:随着图结构数据的日益普及,人们对研究图数据依赖及其在图数据剖析等应用中的兴趣也日益增长。图生成依赖(GGDs)是一类适用于属性图的依赖关系,能够基于属性相似性表达不同图模式与约束之间的关联。GGDs丰富的语法和语义使其成为图数据剖析的理想工具。然而,当缺乏数据专家时,手动定义GGDs具有较大难度。本文提出GGDMiner框架,旨在自动从图数据中发现近似GGDs,从而通过GGDs为用户实现图数据剖析。GGDMiner包含三个主要步骤:(1)预处理,(2)候选生成,(3)GGD提取。为优化内存消耗与执行时间,GGDMiner采用一种称为"答案图"(Answer Graph)的因子化表示来存储每个发现的图模式。实验结果表明,所发现的GGDs集合能够概述输入图的模式级信息以及图模式与属性之间的关联性。