Text-to-SQL aims at generating SQL queries for the given natural language questions and thus helping users to query databases. Prompt learning with large language models (LLMs) has emerged as a recent approach, which designs prompts to lead LLMs to understand the input question and generate the corresponding SQL. However, it faces challenges with strict SQL syntax requirements. Existing work prompts the LLMs with a list of demonstration examples (i.e. question-SQL pairs) to generate SQL, but the fixed prompts can hardly handle the scenario where the semantic gap between the retrieved demonstration and the input question is large. In this paper, we propose a retrieval-augmented prompting method for a LLM-based Text-to-SQL framework, involving sample-aware prompting and a dynamic revision chain. Our approach incorporates sample-aware demonstrations, which include the composition of SQL operators and fine-grained information related to the given question. To retrieve questions sharing similar intents with input questions, we propose two strategies for assisting retrieval. Firstly, we leverage LLMs to simplify the original questions, unifying the syntax and thereby clarifying the users' intentions. To generate executable and accurate SQLs without human intervention, we design a dynamic revision chain which iteratively adapts fine-grained feedback from the previously generated SQL. Experimental results on three Text-to-SQL benchmarks demonstrate the superiority of our method over strong baseline models.
翻译:文本到SQL旨在根据给定的自然语言问题生成SQL查询,从而帮助用户查询数据库。基于大型语言模型(LLM)的提示学习已成为一种新方法,通过设计提示引导LLM理解输入问题并生成对应的SQL。然而,该方法面临严格SQL语法要求的挑战。现有工作利用示例样本(即问题-SQL对)列表提示LLM生成SQL,但固定提示难以应对检索示例与输入问题之间存在较大语义差距的场景。本文提出一种基于检索增强提示的LLM文本到SQL框架,包含样本感知提示和动态修正链。我们的方法整合了样本感知的演示示例,包括SQL运算符组合以及与给定问题相关的细粒度信息。为检索与输入问题意图相似的样本,我们提出两种辅助检索策略:首先利用LLM简化原始问题,统一语法结构以明确用户意图;其次为在不需人工干预的情况下生成可执行且准确的SQL,设计动态修正链,迭代适配先前生成SQL的细粒度反馈。在三个文本到SQL基准上的实验结果表明,我们的方法优于强基线模型。