Instance-Optimal Acyclic Join Processing Without Regret: Engineering the Yannakakis Algorithm in Column Stores

Acyclic join queries can be evaluated instance-optimally using Yannakakis' algorithm, which avoids needlessly large intermediate results through semi-join passes. Recent work proposes to address the significant hidden constant factors arising from a naive implementation of Yannakakis by decomposing the hash join operator into two suboperators, called Lookup and Expand. In this paper, we present a novel method for integrating Lookup and Expand plans in interpreted environments, like column stores, formalizing them using Nested Semijoin Algebra (NSA) and implementing them through a shredding approach. We characterize the class of NSA expressions that can be evaluated instance-optimally as those that are 2-phase: no `shrinking' operator is applied after an unnest (i.e., expand). We introduce Shredded Yannakakis (SYA), an evaluation algorithm for acyclic joins that, starting from a binary join plan, transforms it into a 2-phase NSA plan, and then evaluates it through the shredding technique. We show that SYA is provably robust (i.e., never produces large intermediate results) and without regret (i.e., is never worse than the binary join plan under a suitable cost model) on the class of well-behaved binary join plans. Our experiments on a suite of 1,849 queries show that SYA improves performance for 88.7% of the queries with speedups up to 188x, while remaining competitive on the other queries. We hope this approach offers a fresh perspective on Yannakakis' algorithm, helping system engineers better understand its practical benefits and facilitating its adoption into a broader spectrum of query engines.

翻译：无环连接查询可通过Yannakakis算法实现实例最优评估，该算法通过半连接传递避免产生不必要的大规模中间结果。近期研究提出将哈希连接算子分解为Lookup和Expand两个子算子，以解决Yannakakis算法朴素实现中产生的显著隐藏常数因子问题。本文提出一种在解释型环境（如列存储）中集成Lookup和Expand计划的新方法，使用嵌套半连接代数（NSA）对其进行形式化，并通过分片技术实现。我们将可实例最优评估的NSA表达式类别特征化为2阶段表达式：在解嵌套（即扩展）操作后不应用任何“收缩”算子。我们提出分片式Yannakakis算法（SYA），这是一种针对无环连接的评估算法：从二元连接计划出发，将其转换为2阶段NSA计划，再通过分片技术进行评估。我们证明在行为良好的二元连接计划类别上，SYA具有可证明的鲁棒性（即永不产生大规模中间结果）和无遗憾性（即在适当成本模型下永不劣于原始二元连接计划）。在1,849个查询组成的测试集上，SYA在88.7%的查询中实现性能提升（加速比最高达188倍），其余查询仍保持竞争力。我们希望该方法能为Yannakakis算法提供新视角，帮助系统工程师深入理解其实际优势，并推动其在更广泛的查询引擎中得到应用。

相关内容

Engineering

关注 7

《工程》是中国工程院（CAE）于2015年推出的国际开放存取期刊。其目的是提供一个高水平的平台，传播和分享工程研发的前沿进展、当前主要研究成果和关键成果；报告工程科学的进展，讨论工程发展的热点、兴趣领域、挑战和前景，在工程中考虑人与环境的福祉和伦理道德，鼓励具有深远经济和社会意义的工程突破和创新，使之达到国际先进水平，成为新的生产力，从而改变世界，造福人类，创造新的未来。期刊链接：https://www.sciencedirect.com/journal/engineering

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日