Query optimization is a critical task in database systems, focused on determining the most efficient way to execute a query from an enormous set of possible strategies. Traditional approaches rely on heuristic search methods and cost predictions, but these often struggle with the complexity of the search space and inaccuracies in performance estimation, leading to suboptimal plan choices. This paper presents LLMOpt, a novel framework that leverages Large Language Models (LLMs) to address these challenges through two innovative components: (1) LLM for Plan Candidate Generation (LLMOpt(G)), which eliminates heuristic search by utilizing the reasoning abilities of LLMs to directly generate high-quality query plans, and (2) LLM for Plan Candidate Selection (LLMOpt(S)), a list-wise cost model that compares candidates globally to enhance selection accuracy. To adapt LLMs for query optimization, we propose fine-tuning pre-trained models using optimization data collected offline. Experimental results on the JOB, JOB-EXT, and Stack benchmarks show that LLMOpt(G) and LLMOpt(S) outperform state-of-the-art methods, including PostgreSQL, BAO, and HybridQO. Notably, LLMOpt(S) achieves the best practical performance, striking a balance between plan quality and inference efficiency.
翻译:查询优化是数据库系统中的关键任务,其核心在于从海量可能的执行策略中确定最高效的查询执行方式。传统方法依赖于启发式搜索与成本预测,但这些方法常因搜索空间复杂性与性能估计不准确而陷入困境,导致产生次优的执行计划。本文提出LLMOpt,一种利用大语言模型应对上述挑战的创新框架,其包含两个核心组件:(1) 用于计划候选生成的LLM(LLMOpt(G)),该组件通过利用LLM的推理能力直接生成高质量查询计划,从而摆脱对启发式搜索的依赖;(2) 用于计划候选选择的LLM(LLMOpt(S)),这是一种基于列表排序的成本模型,通过全局对比候选计划以提升选择精度。为使LLM适配查询优化任务,我们提出使用离线收集的优化数据对预训练模型进行微调。在JOB、JOB-EXT及Stack基准测试上的实验结果表明,LLMOpt(G)与LLMOpt(S)在性能上超越了包括PostgreSQL、BAO及HybridQO在内的前沿方法。特别值得注意的是,LLMOpt(S)实现了最佳的实际性能,在计划质量与推理效率之间取得了卓越的平衡。