The performance of worst-case optimal join algorithms depends on the order in which the join attributes are processed. Selecting good orders before query execution is hard, due to the large space of possible orders and unreliable execution cost estimates in case of data skew or data correlation. We propose ADOPT, a query engine that combines adaptive query processing with a worst-case optimal join algorithm, which uses an order on the join attributes instead of a join order on relations. ADOPT divides query execution into episodes in which different attribute orders are tried. Based on run time feedback on attribute order performance, ADOPT converges quickly to near-optimal orders. It avoids redundant work across different orders via a novel data structure, keeping track of parts of the join input that have been successfully processed. It selects attribute orders to try via reinforcement learning, balancing the need for exploring new orders with the desire to exploit promising orders. In experiments with various data sets and queries, it outperforms baselines, including commercial and open-source systems using worst-case optimal join algorithms, whenever queries become complex and therefore difficult to optimize.
翻译:摘要:最坏情况最优连接算法的性能取决于连接属性被处理的顺序。由于可能的顺序空间巨大,且在数据倾斜或数据相关情况下执行成本估算不可靠,因此在查询执行前选择良好顺序十分困难。我们提出ADOPT,一种将自适应查询处理与最坏情况最优连接算法相结合的查询引擎,该算法使用属性顺序而非关系上的连接顺序。ADOPT将查询执行划分为多个片段,每个片段尝试不同的属性顺序。基于属性顺序性能的运行时反馈,ADOPT快速收敛至接近最优的顺序。它通过一种新颖的数据结构避免不同顺序间的冗余工作,跟踪已成功处理的连接输入部分。ADOPT通过强化学习选择待尝试的属性顺序,平衡探索新顺序与利用有潜力顺序的需求。在使用多种数据集和查询进行的实验中,每当查询变得复杂而难以优化时,ADOPT的性能均优于基准方法,包括采用最坏情况最优连接算法的商业和开源系统。