Query optimization has become a research area where classical algorithms are being challenged by machine learning algorithms. At the same time, recent trends in learned query optimizers have shown that it is prudent to take advantage of decades of database research and augment classical query optimizers by shrinking the plan search space through different types of hints (e.g. by specifying the join type, scan type or the order of joins) rather than completely replacing the classical query optimizer with machine learning models. It is especially relevant for cases when classical optimizers cannot fully enumerate all logical and physical plans and, as an alternative, need to rely on less robust approaches like genetic algorithms. However, even symbiotically learned query optimizers are hampered by the need for vast amounts of training data, slow plan generation during inference and unstable results across various workload conditions. In this paper, we present GenJoin - a novel learned query optimizer that considers the query optimization problem as a generative task and is capable of learning from a random set of subplan hints to produce query plans that outperform the classical optimizer. GenJoin is the first learned query optimizer that significantly and consistently outperforms PostgreSQL as well as state-of-the-art methods on two well-known real-world benchmarks across a variety of workloads using rigorous machine learning evaluations.
翻译:查询优化已成为机器学习算法挑战经典算法的研究领域。与此同时,学习式查询优化器的最新趋势表明,利用数十年数据库研究成果、通过不同类型的提示(例如指定连接类型、扫描类型或连接顺序)缩小计划搜索空间来增强经典查询优化器,而非完全用机器学习模型替代经典查询优化器,是更为审慎的做法。这在经典优化器无法完全枚举所有逻辑与物理计划、需要依赖遗传算法等鲁棒性较差的方法时尤为相关。然而,即使是共生式学习查询优化器,也受限于对海量训练数据的需求、推理过程中缓慢的计划生成速度,以及在不同工作负载条件下结果不稳定的问题。本文提出GenJoin——一种新颖的学习式查询优化器,它将查询优化问题视为生成式任务,能够从随机子计划提示集合中学习,生成优于经典优化器的查询计划。通过严格的机器学习评估,GenJoin是首个在多种工作负载下,于两个知名真实世界基准测试中显著且持续超越PostgreSQL及现有先进方法的学习式查询优化器。