Path queries are crucial for property graphs, and there is growing interest in queries that combine regular expressions over labels with constraints on property values of vertices and edges. Efficient evaluation of such general path queries requires that intermediate results be eliminated early when there is no possible completion to a full result path. Neither state-of-the-art (SOA) graph DBMS nor relational DBMS currently can do this effectively for a large class of queries. We show that this problem can be addressed by giving a relational optimizer ``a little help'' by specifying early filtering opportunities explicitly in the query. To this end, we propose ReCAP, an abstraction that greatly simplifies the implementation of early filtering techniques for any type of property constraint for which such early filtering can be derived. No matter how complex the constraint, one only needs to implement (1) an NFA-style state transition function and (2) a handful of functions that mirror those needed for user-defined aggregates. We show that when using ReCAP, a standard relational DBMS like DuckDB can effectively push property constraints deep into the query plan, beating the SOA graph and relational DBMS by a factor up to 400,000 over a variety of queries and input graphs.
翻译:路径查询对于属性图至关重要,且将标签正则表达式与顶点和边的属性值约束相结合的查询日益受到关注。此类通用路径查询的高效评估要求:当不存在通向完整结果路径的可能扩展时,能尽早消除中间结果。当前最先进的图数据库管理系统和关系数据库管理系统均无法对大量查询类别有效实现此目标。我们证明,通过为关系优化器提供"微小帮助"——在查询中显式指定早期过滤机会——可解决该问题。为此,我们提出ReCAP抽象层,大幅简化了针对任意类型属性约束(只要可推导出对应早期过滤条件)的早期过滤技术实现。无论约束多么复杂,仅需实现:(1)类NFA状态转移函数;(2)若干与用户自定义聚合函数需求类似的辅助函数。实验表明,使用ReCAP后,像DuckDB这样的标准关系数据库管理系统能将属性约束深度下推至查询计划,在多种查询和图输入上,性能较最先进的图数据库和关系数据库管理系统最高提升40万倍。