Query re-optimization is an adaptive query processing technique that re-invokes the optimizer at certain points in query execution. The goal is to dynamically correct the cardinality estimation errors using the statistics collected at runtime to adjust the query plan to improve the overall performance. We identify a key weakness in existing re-optimization algorithms: their subquery division and re-optimization trigger strategies rely heavily on the optimizer's initial plan, which can be far away from optimal. We, therefore, propose QuerySplit, a novel re-optimization algorithm that skips the potentially misleading global plan and instead generates subqueries directly from the logical plan as the basic re-optimization units. By developing a cost function that prioritizes the execution of less "damaging" subqueries, QuerySplit successfully postpones (sometimes avoids) the execution of complex large joins to maximize their probability of having smaller input sizes. We implemented QuerySplit in PostgreSQL and compared our solution against four state-of-the-art re-optimization algorithms using the Join Order Benchmark. Our experiments show that QuerySplit reduces the benchmark execution time by 35% compared to the second-best alternative. The performance gap between QuerySplit and an optimal optimizer is within 4%.
翻译:查询重优化是一种自适应查询处理技术,在查询执行过程中的特定节点重新调用优化器。其目标是利用运行时收集的统计信息动态纠正基数估计误差,调整查询计划以提升整体性能。我们发现现有重优化算法存在关键缺陷:其子查询划分与重优化触发策略严重依赖优化器的初始计划,而该计划可能远非最优。为此,我们提出QuerySplit——一种新型重优化算法,该算法跳过可能产生误导的全局计划,直接从逻辑计划生成子查询作为基本重优化单元。通过开发优先执行"低损害"子查询的成本函数,QuerySplit成功推迟(有时避免)复杂大连接操作的执行,以最大化其输入规模缩小的概率。我们在PostgreSQL中实现了QuerySplit,并使用连接顺序基准(Join Order Benchmark)将其与四种最先进的重优化算法进行对比实验。结果表明,与次优算法相比,QuerySplit将基准执行时间降低了35%。QuerySplit与理想优化器之间的性能差距控制在4%以内。