Acyclic join queries can be evaluated instance-optimally using Yannakakis' algorithm, which avoids needlessly large intermediate results through semi-join passes. Recent work proposes to address the significant hidden constant factors arising from a naive implementation of Yannakakis by decomposing the hash join operator into two suboperators, called Lookup and Expand. In this paper, we present a novel method for integrating Lookup and Expand plans in interpreted environments, like column stores, formalizing them using Nested Semijoin Algebra (NSA) and implementing them through a shredding approach. We characterize the class of NSA expressions that can be evaluated instance-optimally as those that are 2-phase: no `shrinking' operator is applied after an unnest (i.e., expand). We introduce Shredded Yannakakis (SYA), an evaluation algorithm for acyclic joins that, starting from a binary join plan, transforms it into a 2-phase NSA plan, and then evaluates it through the shredding technique. We show that SYA is provably robust (i.e., never produces large intermediate results) and without regret (i.e., is never worse than the binary join plan under a suitable cost model) on the class of well-behaved binary join plans. Our experiments on a suite of 1,849 queries show that SYA improves performance for 88.7% of the queries with speedups up to 188x, while remaining competitive on the other queries. We hope this approach offers a fresh perspective on Yannakakis' algorithm, helping system engineers better understand its practical benefits and facilitating its adoption into a broader spectrum of query engines.
翻译:无环连接查询可通过Yannakakis算法实现实例最优评估,该算法通过半连接传递避免产生不必要的大规模中间结果。近期研究提出将哈希连接算子分解为Lookup和Expand两个子算子,以解决Yannakakis算法朴素实现中产生的显著隐藏常数因子问题。本文提出一种在解释型环境(如列存储)中集成Lookup和Expand计划的新方法,使用嵌套半连接代数(NSA)对其进行形式化,并通过分片技术实现。我们将可实例最优评估的NSA表达式类别特征化为2阶段表达式:在解嵌套(即扩展)操作后不应用任何“收缩”算子。我们提出分片式Yannakakis算法(SYA),这是一种针对无环连接的评估算法:从二元连接计划出发,将其转换为2阶段NSA计划,再通过分片技术进行评估。我们证明在行为良好的二元连接计划类别上,SYA具有可证明的鲁棒性(即永不产生大规模中间结果)和无遗憾性(即在适当成本模型下永不劣于原始二元连接计划)。在1,849个查询组成的测试集上,SYA在88.7%的查询中实现性能提升(加速比最高达188倍),其余查询仍保持竞争力。我们希望该方法能为Yannakakis算法提供新视角,帮助系统工程师深入理解其实际优势,并推动其在更广泛的查询引擎中得到应用。