A regular path query (RPQ) is a regular expression q that returns all node pairs (u, v) from a graph database that are connected by an arbitrary path labelled with a word from L(q). The obvious algorithmic approach to RPQ-evaluation (called PG-approach), i.e., constructing the product graph between an NFA for q and the graph database, is appealing due to its simplicity and also leads to efficient algorithms. However, it is unclear whether the PG-approach is optimal. We address this question by thoroughly investigating which upper complexity bounds can be achieved by the PG-approach, and we complement these with conditional lower bounds (in the sense of the fine-grained complexity framework). A special focus is put on enumeration and delay bounds, as well as the data complexity perspective. A main insight is that we can achieve optimal (or near optimal) algorithms with the PG-approach, but the delay for enumeration is rather high (linear in the database). We explore three successful approaches towards enumeration with sub-linear delay: super-linear preprocessing, approximations of the solution sets, and restricted classes of RPQs.
翻译:正则路径查询(RPQ)是一种正则表达式q,它返回图数据库中由标签属于L(q)的任意路径连接的所有节点对(u,v)。RPQ评估的直观算法方法(称为PG方法),即构建q的NFA与图数据库的乘积图,因其简单性而具有吸引力,并且也能产生高效算法。然而,PG方法是否最优尚不清楚。我们通过彻底研究PG方法可达到的上界复杂度,并以条件性下界(依据细粒度复杂性框架)作为补充来探讨这一问题。特别关注枚举与延迟界,以及数据复杂性视角。一个主要见解是,通过PG方法我们可以实现最优(或近最优)算法,但枚举的延迟相当高(与数据库呈线性关系)。我们探索了三种实现亚线性延迟枚举的成功方法:超线性预处理、解集的近似以及RPQ的限制类。