This paper introduces RG (Relational Genetic) model, a revised relational model to represent graph-structured data in RDBMS while preserving its topology, for efficiently and effectively extracting data in different formats from disparate sources. Along with: (a) SQL$_\delta$, an SQL dialect augmented with graph pattern queries and tuple-vertex joins, such that one can extract graph properties via graph pattern matching, and "semantically" match entities across relations and graphs; (b) a logical representation of graphs in RDBMS, which introduces an exploration operator for efficient pattern querying, supports also browsing and updating graph-structured data; and (c) a strategy to uniformly evaluate SQL, pattern and hybrid queries that join tuples and vertices, all inside an RDBMS by leveraging its optimizer without performance degradation on switching different execution engines. A lightweight system, WhiteDB, is developed as an implementation to evaluate the benefits it can actually bring on real-life data. We empirically verified that the RG model enables the graph pattern queries to be answered as efficiently as in native graph engines; can consider the access on graph and relation in any order for optimal plan; and supports effective data enrichment.
翻译:本文提出RG(关系遗传)模型——一种改进的关系模型,用于在关系数据库管理系统(RDBMS)中表示图结构数据并保留其拓扑结构,以高效且有效地从不同来源提取异构格式的数据。该模型包含:(a)SQL$_\delta$——一种扩展了图模式查询和元组-顶点联接的SQL方言,支持通过图模式匹配提取图属性,并实现跨关系与图的“语义级”实体匹配;(b)RDBMS中图的逻辑表示,引入探索算子支持高效模式查询,同时支持图结构数据的浏览与更新;(c)统一评估SQL查询、模式查询及联接元组与顶点的混合查询策略——所有操作均在RDBMS内部通过其优化器完成,无需在切换不同执行引擎时牺牲性能。我们开发了轻量级系统WhiteDB作为实现方案,以评估其在实际数据中的应用价值。实验验证表明:RG模型可使图模式查询效率媲美原生图引擎;支持按任意顺序访问图与关系以获取最优执行计划;并实现高效的数据增强。