LQRS: Learned Query Re-optimization Framework for Spark SQL

The query optimizer is a fundamental component of database management systems. Recent studies have shown that learned query optimizers outperform traditional cost-based query optimizers. However, they fail to exploit valuable runtime observations generated during query execution to dynamically re-optimize the plan, thereby limiting further improvements in query performance. To address this issue, we propose learned query re-optimization, which allows optimization decisions to be deferred to execution time and guided by actual runtime observations. We realize this idea through LQRS, a learned query re-optimization framework that builds upon Spark SQL, exploiting runtime observations for dynamic plan refinement. Specifically, LQRS employs a curriculum reinforcement learning strategy and jointly supports pre-execution and in-execution optimization, allowing knowledge learned during execution to directly benefit pre-execution planning. Furthermore, we design a plug-and-play planner extension built upon the extensibility interfaces of Spark SQL, enabling online plan modification. Experiments on Spark SQL demonstrate that LQRS reduces end-to-end execution time by up to 90% compared to other learned query optimizers and query re-optimization methods.

翻译：查询优化器是数据库管理系统的核心组件。近期研究表明，基于学习的查询优化器性能优于传统的基于代价的查询优化器。然而，现有方法未能充分利用查询执行过程中产生的宝贵运行时观测信息进行动态计划重优化，从而限制了查询性能的进一步提升。为解决这一问题，我们提出学习式查询重优化方法，将优化决策延迟至执行阶段，并依据实际运行时观测进行动态指导。我们通过LQRS框架实现这一理念——该框架基于Spark SQL构建，利用运行时观测实现动态计划优化。具体而言，LQRS采用课程强化学习策略，同时支持执行前与执行中优化，使得执行阶段习得的知识能够直接提升执行前规划效果。此外，我们基于Spark SQL的可扩展接口设计了即插即用式规划器扩展模块，支持在线计划修改。在Spark SQL上的实验表明，相较于其他学习式查询优化器与查询重优化方法，LQRS最多可减少90%的端到端执行时间。

相关内容

Spark

关注 51

Apache Spark 是专为大规模数据处理而设计的快速通用的计算引擎。Spark是UC Berkeley AMP lab (加州大学伯克利分校的AMP实验室)所开源的类Hadoop MapReduce的通用并行框架，Spark，拥有Hadoop MapReduce所具有的优点；但不同于MapReduce的是Job中间输出结果可以保存在内存中，从而不再需要读写HDFS，因此Spark能更好地适用于数据挖掘与机器学习等需要迭代的MapReduce的算法。

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

66+阅读 · 2023年2月15日

智能数据库学习型索引研究综述

专知会员服务

23+阅读 · 2023年1月14日