Schema linking is a crucial step in Text-to-SQL pipelines, which translate natural language queries into SQL. The goal of schema linking is to retrieve relevant tables and columns (signal) while disregarding irrelevant ones (noise). However, imperfect schema linking can often exclude essential columns needed for accurate query generation. In this work, we revisit the need for schema linking when using the latest generation of large language models (LLMs). We find empirically that newer models are adept at identifying relevant schema elements during generation, without the need for explicit schema linking. This allows Text-to-SQL pipelines to bypass schema linking entirely and instead pass the full database schema to the LLM, eliminating the risk of excluding necessary information. Furthermore, as alternatives to schema linking, we propose techniques that improve Text-to-SQL accuracy without compromising on essential schema information. Our approach achieves 71.83\% execution accuracy on the BIRD benchmark, ranking first at the time of submission.
翻译:模式链接是文本到SQL(Text-to-SQL)流程中的关键步骤,该流程旨在将自然语言查询转换为SQL语句。模式链接的目标是检索相关的表和列(信号),同时忽略不相关的部分(噪声)。然而,不完善的模式链接常常会遗漏生成准确查询所必需的关键列。在本研究中,我们重新审视了在使用最新一代大语言模型(LLMs)时对模式链接的需求。我们通过实证研究发现,较新的模型能够在生成过程中熟练识别相关的模式元素,而无需显式的模式链接。这使得文本到SQL流程可以完全绕过模式链接,转而将完整的数据库模式传递给大语言模型,从而消除了遗漏必要信息的风险。此外,作为模式链接的替代方案,我们提出了几种技术,在不牺牲必要模式信息的前提下提升文本到SQL的准确率。我们的方法在BIRD基准测试中取得了71.83%的执行准确率,在提交时位列榜首。