A regular path query (RPQ) is a regular expression q that returns all node pairs (u, v) from a graph database that are connected by an arbitrary path labelled with a word from L(q). The obvious algorithmic approach to RPQ-evaluation (called PG-approach), i.e., constructing the product graph between an NFA for q and the graph database, is appealing due to its simplicity and also leads to efficient algorithms. However, it is unclear whether the PG-approach is optimal. We address this question by thoroughly investigating which upper complexity bounds can be achieved by the PG-approach, and we complement these with conditional lower bounds (in the sense of the fine-grained complexity framework). A special focus is put on enumeration and delay bounds, as well as the data complexity perspective. A main insight is that we can achieve optimal (or near optimal) algorithms with the PG-approach, but the delay for enumeration is rather high (linear in the database). We explore three successful approaches towards enumeration with sub-linear delay: super-linear preprocessing, approximations of the solution sets, and restricted classes of RPQs.
翻译:正则路径查询(RPQ)是一个正则表达式q,用于从图数据库中返回所有满足以下条件的节点对(u,v):存在一条任意路径,其标签序列属于L(q)。处理RPQ评估的直观算法方法(称为PG方法),即构建q的非确定性自动机(NFA)与图数据库的乘积图,因其简洁性而具有吸引力,并能产生高效的算法。然而,尚不清楚PG方法是否具有最优性。我们通过深入探究PG方法能够实现的上界复杂度阈值来解答这一问题,并借助细粒度复杂度框架的条件性下界对其进行补充分析。特别关注枚举与延迟界限,以及数据复杂度视角。主要发现是:采用PG方法能够实现最优(或近最优)算法,但枚举延迟较高(与数据库规模呈线性关系)。我们探索了三种实现亚线性延迟枚举的有效途径:超线性预处理、解集的近似处理,以及受限类RPQ。