Recent advancements in large language models (LLMs) have significantly contributed to the progress of the Text-to-SQL task. A common requirement in many of these works is the post-correction of SQL queries. However, the majority of this process entails analyzing error cases to develop prompts with rules that eliminate model bias. And there is an absence of execution verification for SQL queries. In addition, the prevalent techniques primarily depend on GPT-4 and few-shot prompts, resulting in expensive costs. To investigate the effective methods for SQL refinement in a cost-efficient manner, we introduce Semantic-Enhanced Text-to-SQL with Adaptive Refinement (SEA-SQL), which includes Adaptive Bias Elimination and Dynamic Execution Adjustment, aims to improve performance while minimizing resource expenditure with zero-shot prompts. Specifically, SEA-SQL employs a semantic-enhanced schema to augment database information and optimize SQL queries. During the SQL query generation, a fine-tuned adaptive bias eliminator is applied to mitigate inherent biases caused by the LLM. The dynamic execution adjustment is utilized to guarantee the executability of the bias eliminated SQL query. We conduct experiments on the Spider and BIRD datasets to demonstrate the effectiveness of this framework. The results demonstrate that SEA-SQL achieves state-of-the-art performance in the GPT3.5 scenario with 9%-58% of the generation cost. Furthermore, SEA-SQL is comparable to GPT-4 with only 0.9%-5.3% of the generation cost.
翻译:近期,大型语言模型(LLM)的进展显著推动了文本到SQL任务的发展。许多现有工作的共同需求是对生成的SQL查询进行后校正。然而,这一过程大多依赖于分析错误案例以设计包含规则提示来消除模型偏差,且缺乏对SQL查询的执行验证。此外,主流技术主要依赖GPT-4和少样本提示,导致成本高昂。为探索高效且低成本的SQL优化方法,我们提出了基于语义增强与自适应优化的文本到SQL框架(SEA-SQL),该框架包含自适应偏差消除和动态执行调整机制,旨在通过零样本提示提升性能并最大限度降低资源消耗。具体而言,SEA-SQL采用语义增强模式来扩充数据库信息并优化SQL查询。在SQL查询生成过程中,通过微调的自适应偏差消除器来缓解LLM固有的偏差,并利用动态执行调整确保偏差消除后SQL查询的可执行性。我们在Spider和BIRD数据集上进行了实验验证该框架的有效性。结果表明,在GPT-3.5场景下,SEA-SQL以仅9%-58%的生成成本实现了最先进的性能。此外,SEA-SQL的性能与GPT-4相当,而生成成本仅为其0.9%-5.3%。