Recursive queries and recursive derived tables constitute an important part of the SQL standard. Their efficient processing is important for many real-life applications that rely on graph or hierarchy traversal. Position-enabled column-stores offer a novel opportunity to improve run times for this type of queries. Such systems allow the engine to explicitly use data positions (row ids) inside its core and thus, enable novel efficient implementations of query plan operators. In this paper, we present an approach that significantly speeds up recursive query processing inside RDBMSes. Its core idea is to employ a particular aspect of column-store technology (late materialization) which enables the query engine to manipulate data positions during query execution. Based on it, we propose two sets of Volcano-style operators intended to process different query cases. In order validate our ideas, we have implemented the proposed approach in PosDB, an RDBMS column-store with SQL support. We experimentally demonstrate the viability of our approach by providing a comparison with PostgreSQL. Experiments show that for breadth-first search: 1) our position-based approach yields up to 6x better results than PostgreSQL, 2) our tuple-based one results in only 3x improvement when using a special rewriting technique, but it can work in a larger number of cases, and 3) both approaches can't be emulated in row-stores efficiently.
翻译:递归查询和递归派生表是SQL标准的重要组成部分。其高效处理对于许多依赖图或层次结构遍历的实际应用至关重要。支持位置定位的列式存储为提升此类查询的运行时间提供了新的机遇。这类系统允许引擎在其核心内部显式使用数据位置(行ID),从而能够实现查询计划算子的新型高效实现。本文提出了一种显著加速RDBMS中递归查询处理的方法。其核心思想是利用列式存储技术的特定方面(延迟物化),使查询引擎能够在查询执行期间操作数据位置。基于此,我们提出了两组面向不同查询情况的Volcano风格算子。为了验证我们的想法,我们在支持SQL的列式RDBMS系统PosDB中实现了所提出的方法。通过与PostgreSQL的对比实验,我们证明了该方法的可行性。实验表明,对于广度优先搜索:1)基于位置的方法比PostgreSQL性能提升高达6倍;2)基于元组的方法在使用特殊重写技术时仅能实现3倍的性能提升,但适用于更多场景;3)两种方法均无法在行式存储中高效模拟实现。