Mining information from graph databases is becoming overly important. To approach this problem, current methods focus on identifying subgraphs with specific topologies; as of today, no work has been focused on expressing jointly the syntax and semantics of mining operations over rich property graphs. We define MINE GRAPH RULE, a new operator for mining association rules from graph databases, by extending classical approaches used in relational databases and exploited by recommending systems. We describe the syntax and semantics of the operator, which is based on measuring the support and confidence of each rule, and then we provide several examples of increasing complexity on top of a realistic example; our operator embeds Cypher for expressing the mining conditions. MINE GRAPH RULE is implemented on top of Neo4j, the most successful graph database system; it takes advantage of built-in optimizations of the Neo4j engine, as well as optimizations that are defined in the context of relational association rules. Our implementation is available as a portable Neo4j plugin. At the end of our paper, we show the execution performance in a variety of settings, by varying the operators, the size of the graph, the ratio between node types, the method for creating relationships, and maximum support and confidence.
翻译:从图数据库中挖掘信息正变得日益重要。针对这一问题,现有方法主要关注识别具有特定拓扑结构的子图;迄今为止,尚未有工作专注于在丰富的属性图上联合表达挖掘操作的语法和语义。我们通过扩展关系数据库中使用的经典方法(并被推荐系统所利用),定义了一种新的操作符MINE GRAPH RULE,用于从图数据库中挖掘关联规则。我们描述了该操作符的语法和语义,其基础是度量每条规则的支持度和置信度,随后我们基于一个现实示例提供了多个复杂度递增的示例;我们的操作符嵌入了Cypher语言用于表达挖掘条件。MINE GRAPH RULE在最成功的图数据库系统Neo4j之上实现;它利用了Neo4j引擎的内置优化,以及在关系型关联规则上下文中定义的优化。我们的实现以一个可移植的Neo4j插件形式提供。在论文结尾,我们通过改变操作符、图的规模、节点类型比例、关系创建方法以及最大支持度和置信度,展示了多种设置下的执行性能。