Generating accurate Structured Querying Language (SQL) is a long-standing problem, especially in matching users' semantic queries with structured databases and then generating structured SQL. Existing models typically input queries and database schemas into the LLM and rely on the LLM to perform semantic-structure matching and generate structured SQL. However, such solutions overlook the structural information within user queries and databases, which can be utilized to enhance the generation of structured SQL. This oversight can lead to inaccurate or unexecutable SQL generation. To fully exploit the structure, we propose a structure-to-SQL framework, which leverages the inherent structure information to improve the SQL generation of LLMs. Specifically, we introduce our Structure Guided SQL~(SGU-SQL) generation model. SGU-SQL first links user queries and databases in a structure-enhanced manner. It then decomposes complicated linked structures with grammar trees to guide the LLM to generate the SQL step by step. Extensive experiments on two benchmark datasets illustrate that SGU-SQL can outperform sixteen SQL generation baselines.
翻译:生成准确的结构化查询语言(SQL)是一个长期存在的问题,尤其是在将用户的语义查询与结构化数据库匹配并生成结构化SQL方面。现有模型通常将用户查询和数据库模式输入到大语言模型(LLM)中,并依赖LLM进行语义-结构匹配以生成结构化SQL。然而,此类方法忽视了用户查询和数据库中蕴含的结构信息——这些信息可用于增强结构化SQL的生成。这种忽视可能导致生成不准确或无法执行的SQL。为充分利用结构信息,我们提出了一种结构到SQL的框架(structure-to-SQL),该框架利用固有结构信息来提升LLM的SQL生成能力。具体而言,我们引入了结构引导的SQL(SGU-SQL)生成模型。SGU-SQL首先以结构增强的方式链接用户查询和数据库,然后通过语法树对复杂的链接结构进行分解,从而引导LLM逐步生成SQL。在两个基准数据集上的大量实验表明,SGU-SQL能够超越十六种SQL生成基线方法。