Despite decades of research into query optimization, optimizing queries with disjunctive predicate expressions remains a challenge. Solutions employed by existing systems (if any) are often simplistic and lead to much redundant work being performed by the execution engine. To address these problems, we propose a novel form of query execution called tagged execution. Tagged execution groups tuples into subrelations based on which predicates in the query they satisfy (or don't satisfy) and tags them with that information. These tags then provide additional context for query operators to take advantage of during runtime, allowing them to eliminate much of the redundant work performed by traditional engines and realize predicate pushdown optimizations for disjunctive predicates. However, tagged execution brings its own challenges, and the question of what tags to create is a nontrivial one. Careless creation of tags can lead to an exponential blowup in the tag space, with the overhead outweighing the benefits. To address this issue, we present a technique called tag generalization to minimize the space of tags. We implemented the tagged execution model with tag generalization in our system Basilisk, and our evaluation shows an average 2.7x speedup in runtime over the traditional execution model with up to a 19x speedup in certain situations.
翻译:尽管查询优化研究已开展数十年,但包含析取谓词表达式的查询优化仍面临挑战。现有系统采用的解决方案(若有)往往过于简单,导致执行引擎执行大量冗余计算。为解决这些问题,我们提出一种新型查询执行方式——标签执行。标签执行根据元组满足(或不满足)查询中哪些谓词将其分组为子关系,并为其附加相应标签。这些标签可为查询算子运行时提供额外上下文信息,使其能够消除传统引擎中的大量冗余计算,并实现对析取谓词的谓词下推优化。然而,标签执行本身也带来挑战,其中标签生成策略的设计尤为关键。不当的标签创建可能导致标签空间呈指数级膨胀,使得开销超过收益。针对该问题,我们提出标签泛化技术以最小化标签空间。我们在系统Basilisk中实现了融合标签泛化的标签执行模型,评估结果显示,相较于传统执行模型,运行时平均加速比达2.7倍,特定场景下最高可达19倍。