In tackling the challenges of large language model (LLM) performance for Text-to-SQL tasks, we introduce CHASE-SQL, a new framework that employs innovative strategies, using test-time compute in multi-agent modeling to improve candidate generation and selection. CHASE-SQL leverages LLMs' intrinsic knowledge to generate diverse and high-quality SQL candidates using different LLM generators with: (1) a divide-and-conquer method that decomposes complex queries into manageable sub-queries in a single LLM call; (2) chain-of-thought reasoning based on query execution plans, reflecting the steps a database engine takes during execution; and (3) a unique instance-aware synthetic example generation technique, which offers specific few-shot demonstrations tailored to test questions.To identify the best candidate, a selection agent is employed to rank the candidates through pairwise comparisons with a fine-tuned binary-candidates selection LLM. This selection approach has been demonstrated to be more robust over alternatives. The proposed generators-selector framework not only enhances the quality and diversity of SQL queries but also outperforms previous methods. Overall, our proposed CHASE-SQL achieves the state-of-the-art execution accuracy of 73.0% and 73.01% on the test set and development set of the notable BIRD Text-to-SQL dataset benchmark, rendering CHASE-SQL the top submission of the leaderboard (at the time of paper submission).
翻译:为应对大型语言模型在文本到SQL任务中的性能挑战,我们提出了CHASE-SQL,这是一个新颖的框架,采用创新策略,通过在多智能体建模中利用测试时计算来改进候选生成与选择。CHASE-SQL利用LLM的固有知识,通过不同的LLM生成器生成多样且高质量的SQL候选,具体方法包括:(1) 一种分治法,可在单次LLM调用中将复杂查询分解为可管理的子查询;(2) 基于查询执行计划的思维链推理,反映了数据库引擎在执行过程中采取的步骤;(3) 一种独特的实例感知合成示例生成技术,可为测试问题提供量身定制的特定少样本示例。为识别最佳候选,系统采用一个选择智能体,通过微调的二元候选选择LLM进行成对比较来对候选进行排序。实践证明,这种选择方法比替代方案更为鲁棒。所提出的生成器-选择器框架不仅提升了SQL查询的质量与多样性,而且性能优于先前的方法。总体而言,我们提出的CHASE-SQL在著名的BIRD文本到SQL数据集基准的测试集和开发集上分别达到了73.0%和73.01%的最先进执行准确率,使CHASE-SQL成为该排行榜(截至论文提交时)的领先提交方案。