Schema linking is a crucial step in Text-to-SQL pipelines. Its goal is to retrieve the relevant tables and columns of a target database for a user's query while disregarding irrelevant ones. However, imperfect schema linking can often exclude required columns needed for accurate query generation. In this work, we revisit schema linking when using the latest generation of large language models (LLMs). We find empirically that newer models are adept at utilizing relevant schema elements during generation even in the presence of large numbers of irrelevant ones. As such, our Text-to-SQL pipeline entirely forgoes schema linking in cases where the schema fits within the model's context window in order to minimize issues due to filtering required schema elements. Furthermore, instead of filtering contextual information, we highlight techniques such as augmentation, selection, and correction, and adopt them to improve the accuracy of our Text-to-SQL pipeline. Our approach ranks first on the BIRD benchmark achieving an accuracy of 71.83%.
翻译:模式链接是文本到SQL转换流程中的关键步骤,其目标是为用户查询检索目标数据库中的相关表和列,同时忽略不相关的部分。然而,不完善的模式链接常常会排除准确查询生成所需的必要列。在本研究中,我们重新审视了使用最新一代大语言模型时的模式链接问题。我们通过实证发现,即使存在大量不相关模式元素,较新的模型在生成过程中也能熟练利用相关模式元素。因此,我们的文本到SQL转换流程在模式能够适配模型上下文窗口的情况下完全放弃了模式链接,以最大程度减少因过滤必要模式元素引发的问题。此外,我们强调通过增强、选择和修正等技术手段替代信息过滤策略,并采用这些方法来提升文本到SQL转换流程的准确性。我们的方法在BIRD基准测试中以71.83%的准确率位列第一。