Analytics database workloads often contain queries that are executed repeatedly. Existing optimization techniques generally prioritize keeping optimization cost low, normally well below the time it takes to execute a single instance of a query. If a given query is going to be executed thousands of times, could it be worth investing significantly more optimization time? In contrast to traditional online query optimizers, we propose an offline query optimizer that searches a wide variety of plans and incorporates query execution as a primitive. Our offline query optimizer combines variational auto-encoders with Bayesian optimization to find optimized plans for a given query. We compare our technique to the optimal plans possible with PostgreSQL and recent RL-based systems over several datasets, and show that our technique finds faster query plans.
翻译:分析型数据库工作负载常包含重复执行的查询。现有优化技术通常优先保持较低的优化成本,通常远低于单次查询执行时间。若某查询将被执行数千次,是否值得投入显著更多的优化时间?与传统在线查询优化器不同,我们提出一种离线查询优化器,其能够搜索多种执行计划并将查询执行作为基本操作。该离线查询优化器结合变分自编码器与贝叶斯优化技术,为给定查询寻找优化方案。通过在多个数据集上与PostgreSQL及最新基于强化学习的系统所能达到的最优计划进行对比,我们证明本方法能够发现更快的查询执行计划。