Regular Path Queries (RPQs), which are essentially regular expressions to be matched against the labels of paths in labeled graphs, are at the core of graph database query languages like SPARQL. A way to solve RPQs is to translate them into a sequence of operations on the adjacency matrices of each label. We design and implement a Boolean algebra on sparse matrix representations and, as an application, use them to handle RPQs. Our baseline representation uses the same space as the previously most compact index for RPQs and outperforms it on the hardest types of queries -- those where both RPQ endpoints are unspecified. Our more succinct structure, based on $k^2$-trees, is 4 times smaller than any existing representation that handles RPQs, and still solves complex RPQs in a few seconds. Our new sparse-matrix-based representations dominate a good portion of the space/time tradeoff map, being outperformed only by representations that use much more space. They are also of independent interest beyond solving RPQs.
翻译:正则路径查询(Regular Path Queries, RPQs)本质上是针对带标签图中路径标签进行匹配的正则表达式,是SPARQL等图数据库查询语言的核心。解决RPQs的一种方法是将它们转化为每个标签邻接矩阵上的操作序列。我们设计并实现了一种基于稀疏矩阵表示的布尔代数,并将其应用于处理RPQs。我们的基准表示在空间占用上与先前最紧凑的RPQs索引相同,而在最难类型的查询(即RPQ两端点均未指定的情况)上性能更优。基于$k^2$-tree的更紧凑结构,其体积比任何现有能处理RPQs的表示小4倍,且仍能在数秒内解决复杂RPQs。我们新的基于稀疏矩阵的表示在空间/时间权衡图中占据了主导地位,仅被那些占用更多空间的表示所超越。这些表示方法除了解决RPQs外,也具有独立的研究价值。