A regular path query (RPQ) is a regular expression q that returns all node pairs (u, v) from a graph database that are connected by an arbitrary path labelled with a word from L(q). The obvious algorithmic approach to RPQ-evaluation (called PG-approach), i.e., constructing the product graph between an NFA for q and the graph database, is appealing due to its simplicity and also leads to efficient algorithms. However, it is unclear whether the PG-approach is optimal. We address this question by thoroughly investigating which upper complexity bounds can be achieved by the PG-approach, and we complement these with conditional lower bounds (in the sense of the fine-grained complexity framework). A special focus is put on enumeration and delay bounds, as well as the data complexity perspective. A main insight is that we can achieve optimal (or near optimal) algorithms with the PG-approach, but the delay for enumeration is rather high (linear in the database). We explore three successful approaches towards enumeration with sub-linear delay: super-linear preprocessing, approximations of the solution sets, and restricted classes of RPQs.
翻译:正则路径查询(RPQ)是一个正则表达式q,用于从图数据库中返回所有由标记为L(q)中单词的任意路径连接的点对(u,v)。显然的RPQ求值算法方法(称为PG-方法),即为q构造NFA与图数据库之间的乘积图,因其简洁性而具有吸引力,同时也能实现高效算法。然而,PG-方法是否最优尚不明确。我们通过深入探究PG-方法所能实现的上界复杂度,并结合条件性下界(依据细粒度复杂度框架)来补充研究这一问题。特别关注枚举与延迟界,以及数据复杂度视角。主要洞见在于,我们能够通过PG-方法实现最优(或接近最优)算法,但枚举的延迟较高(与数据库规模成线性关系)。我们探索了三种实现亚线性延迟枚举的有效方法:超线性预处理、解集的近似以及RPQ的受限类别。