While recent advancements in inference-time learning have improved LLM reasoning on Text-to-SQL tasks, current solutions still struggle to perform well on the most challenging tasks in the Bird-Bench (BIRD) benchmark. This is due to inadequate solution space exploration, which is necessary to uncover promising candidate queries that can be further refined to produce the correct output. To address this challenge, we introduce CA-SQL, a novel Text-to-SQL pipeline that utilizes the estimated difficulty of a task to dynamically scale the breadth of the exploration for generating solution candidates. In addition, we use a custom prompt seeding method, based on principles of evolutionary search, to further elicit exploratory behavior from the base LLM and a novel voting method to select the best candidate solution at the end of the search. Experiments demonstrate that our solution achieves a state-of-the-art score of 51.72% on the "challenging" tier of BIRD development set problems, using only GPT-4o-mini, out-performing other in-context learning approaches, even those that leverage larger models. Overall, our method attains a competitive 61.06% execution accuracy and 68.77% Soft F1 score on the BIRD development dataset.
翻译:尽管近期在推理时学习方面的进展提升了大型语言模型在文本到SQL任务上的推理能力,但当前解决方案仍难以在BIRD基准测试中最具挑战性的任务上取得良好表现。这是由于对解空间探索不足——这种探索对于发现可通过进一步精化生成正确输出的潜在候选查询至关重要。为解决这一挑战,我们提出CA-SQL——一种新型文本到SQL流水线,该流水线利用任务的预估难度动态扩展生成候选解时的探索广度。此外,我们基于进化搜索原理采用自定义提示种子方法,进一步激发基础LLM的探索行为,并设计新型投票方法在搜索结束时选择最优候选解。实验表明,仅使用GPT-4o-mini,我们的解决方案在BIRD开发集问题的"挑战级"任务上取得了51.72%的当前最优分数,超越其他上下文学习方法(包括使用更大模型的方法)。总体而言,我们的方法在BIRD开发数据集上实现了具有竞争力的61.06%执行准确率和68.77%软F1分数。