Generating accurate Structured Querying Language (SQL) is a long-standing problem, especially in matching users' semantic queries with structured databases and then generating structured SQL. Existing models typically input queries and database schemas into the LLM and rely on the LLM to perform semantic-structure matching and generate structured SQL. However, such solutions overlook the structural information within user queries and databases, which can be utilized to enhance the generation of structured SQL. This oversight can lead to inaccurate or unexecutable SQL generation. To fully exploit the structure, we propose a structure-to-SQL framework, which leverages the inherent structure information to improve the SQL generation of LLMs. Specifically, we introduce our Structure Guided SQL~(SGU-SQL) generation model. SGU-SQL first links user queries and databases in a structure-enhanced manner. It then decomposes complicated linked structures with grammar trees to guide the LLM to generate the SQL step by step. Extensive experiments on two benchmark datasets illustrate that SGU-SQL can outperform sixteen SQL generation baselines.
翻译:生成准确的结构化查询语言(SQL)是一个长期存在的难题,尤其是在将用户的语义查询与结构化数据库匹配并进而生成结构化SQL方面。现有模型通常将查询和数据库模式输入大语言模型(LLM),并依靠LLM进行语义-结构匹配以生成结构化SQL。然而,此类方法忽略了用户查询和数据库中的结构信息,而这些信息可用于增强结构化SQL的生成。这种忽视可能导致生成不准确或无法执行的SQL。为了充分利用结构信息,我们提出了一种结构到SQL的框架,该框架利用内在的结构信息来提升LLM的SQL生成能力。具体而言,我们引入了结构引导的SQL(SGU-SQL)生成模型。SGU-SQL首先以结构增强的方式链接用户查询和数据库,然后利用语法树分解复杂的链接结构,以引导LLM逐步生成SQL。在两个基准数据集上的大量实验表明,SGU-SQL能够优于十六种SQL生成基线模型。