Online Sketch-based Query Optimization

Cost-based query optimization remains a critical task in relational databases even after decades of research and industrial development. Query optimizers rely on a large range of statistical synopses -- including attribute-level histograms and table-level samples -- for accurate cardinality estimation. As the complexity of selection predicates and the number of join predicates increase, two problems arise. First, statistics cannot be incrementally composed to effectively estimate the cost of the sub-plans generated in plan enumeration. Second, small errors are propagated exponentially through join operators, which can lead to severely sub-optimal plans. In this paper, we introduce COMPASS, a novel query optimization paradigm for in-memory databases based on a single type of statistics -- Fast-AGMS sketches. In COMPASS, query optimization and execution are intertwined. Selection predicates and sketch updates are pushed-down and evaluated online during query optimization. This allows Fast-AGMS sketches to be computed only over the relevant tuples -- which enhances cardinality estimation accuracy. Plan enumeration is performed over the query join graph by incrementally composing attribute-level sketches -- not by building a separate sketch for every sub-plan. We prototype COMPASS in MapD -- an open-source parallel database -- and perform extensive experiments over the complete JOB benchmark. The results prove that COMPASS generates better execution plans -- both in terms of cardinality and runtime -- compared to four other database systems. Overall, COMPASS achieves a speedup ranging from 1.35X to 11.28X in cumulative query execution time over the considered competitors.

翻译：即便经过数十年的研究和工业发展之后,基于成本的查询优化仍然是关系数据库的一项关键任务。即使经过数十年的研究和工业发展之后, 查询优化仍然要依靠大量统计合成 -- -- 包括属性级直方图和表层样本 -- -- 来精确地估算基点。随着选择前端的复杂性和合并上游数的增加, 出现两个问题。首先, 统计不能逐步组成, 以有效估算在计划查点中产生的次级计划的成本。其次, 小错误会通过加入操作者而迅速传播, 从而导致严重低于最佳水平的计划。在本文中, 我们引入 COMPASS, 这是基于单一类型统计数据的分子数据库中新颖的查询优化累积模式 -- -- 包括属性直径直方图和表样本。在COMASS中, 选择上游和草图更新在网上评估。这使得快速AGMS的草图只能根据相关的小图进行计算, 从而提高基本估计的准确性。计划罗比通过考虑的合并图表, 将分级平级平级平级平图进行。在本文中, 我们使用分级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平比。