基于大语言模型的查询优化方法 (A Query Optimization Method Utilizing Large Language Models)

Query optimization is a critical task in database systems, focused on determining the most efficient way to execute a query from an enormous set of possible strategies. Traditional approaches rely on heuristic search methods and cost predictions, but these often struggle with the complexity of the search space and inaccuracies in performance estimation, leading to suboptimal plan choices. This paper presents LLMOpt, a novel framework that leverages Large Language Models (LLMs) to address these challenges through two innovative components: (1) LLM for Plan Candidate Generation (LLMOpt(G)), which eliminates heuristic search by utilizing the reasoning abilities of LLMs to directly generate high-quality query plans, and (2) LLM for Plan Candidate Selection (LLMOpt(S)), a list-wise cost model that compares candidates globally to enhance selection accuracy. To adapt LLMs for query optimization, we propose fine-tuning pre-trained models using optimization data collected offline. Experimental results on the JOB, JOB-EXT, and Stack benchmarks show that LLMOpt(G) and LLMOpt(S) outperform state-of-the-art methods, including PostgreSQL, BAO, and HybridQO. Notably, LLMOpt(S) achieves the best practical performance, striking a balance between plan quality and inference efficiency.

翻译：查询优化是数据库系统中的关键任务，其核心在于从海量可能的执行策略中确定最高效的查询执行方式。传统方法依赖于启发式搜索与成本预测，但这些方法常因搜索空间复杂性与性能估计不准确而陷入困境，导致产生次优的执行计划。本文提出LLMOpt，一种利用大语言模型应对上述挑战的创新框架，其包含两个核心组件：(1) 用于计划候选生成的LLM（LLMOpt(G)），该组件通过利用LLM的推理能力直接生成高质量查询计划，从而摆脱对启发式搜索的依赖；(2) 用于计划候选选择的LLM（LLMOpt(S)），这是一种基于列表排序的成本模型，通过全局对比候选计划以提升选择精度。为使LLM适配查询优化任务，我们提出使用离线收集的优化数据对预训练模型进行微调。在JOB、JOB-EXT及Stack基准测试上的实验结果表明，LLMOpt(G)与LLMOpt(S)在性能上超越了包括PostgreSQL、BAO及HybridQO在内的前沿方法。特别值得注意的是，LLMOpt(S)实现了最佳的实际性能，在计划质量与推理效率之间取得了卓越的平衡。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/