Modern property graph database query languages such as Cypher, PGQL, GSQL, and the standard GQL draw inspiration from the formalism of regular path queries (RPQs). In order to output walks explicitly, they depart from the classical and well-studied homomorphism semantics. However, it then becomes difficult to present results to users because RPQs may match infinitely many walks. The aforementioned languages use ad-hoc criteria to select a finite subset of those matches. For instance, Cypher uses trail semantics, discarding walks with repeated edges; PGQL and GSQL use shortest walk semantics, retaining only the walks of minimal length among all matched walks; and GQL allows users to choose from several semantics. Even though there is academic research on these semantics, it focuses almost exclusively on evaluation efficiency. In an attempt to better understand, choose and design RPQ semantics, we present a framework to categorize and compare them according to other criteria. We formalize several possible properties, pertaining to the study of RPQ semantics seen as mathematical functions mapping a database and a query to a finite set of walks. We show that some properties are mutually exclusive, or cannot be met. We also give several new RPQ semantics as examples. Some of them may provide ideas for the design of new semantics for future graph database query languages.
翻译:现代属性图数据库查询语言(如Cypher、PGQL、GSQL及标准GQL)均从正则路径查询(RPQ)的形式化理论中汲取灵感。为了显式输出路径游走,这些语言放弃了经典且被深入研究的同态语义。然而,由于RPQ可能匹配无限多条游走路径,向用户呈现结果变得困难。前述语言采用特定标准来选取这些匹配的有限子集:例如Cypher采用轨迹语义,舍弃包含重复边的游走;PGQL与GSQL采用最短路径语义,仅保留所有匹配游走中长度最小的路径;而GQL则允许用户从多种语义中进行选择。尽管学界对这些语义已有研究,但几乎完全聚焦于评估效率。为了更好地理解、选择与设计RPQ语义,我们提出一个框架,依据其他标准对其进行分类与比较。我们将RPQ语义形式化为数学函数——将数据库和查询映射到有限游走集合,并据此形式化定义若干关键性质。我们证明某些性质互斥或无法同时满足,同时给出若干新型RPQ语义作为示例。其中部分语义或可为未来图数据库查询语言的新语义设计提供思路。