Generating SQLs from user queries is a long-standing challenge, where the accuracy of initial schema linking significantly impacts subsequent SQL generation performance. However, current schema linking models still struggle with missing relevant schema elements or an excess of redundant ones. A crucial reason for this is that commonly used metrics, recall and precision, fail to capture relevant element missing and thus cannot reflect actual schema linking performance. Motivated by this, we propose enhanced schema linking metrics by introducing a \textbf{restricted missing indicator}. Accordingly, we introduce \textbf{\underline{K}n\underline{a}psack optimization-based \underline{S}chema \underline{L}inking \underline{A}pproach (KaSLA)}, a plug-in schema linking method designed to prevent the missing of relevant schema elements while minimizing the inclusion of redundant ones. KaSLA employs a hierarchical linking strategy that first identifies the optimal table linking and subsequently links columns within the selected table to reduce linking candidate space. In each linking process, it utilizes a knapsack optimization approach to link potentially relevant elements while accounting for a limited tolerance of potentially redundant ones. With this optimization, KaSLA-1.6B achieves superior schema linking results compared to large-scale LLMs, including DeepSeek-V3 with the state-of-the-art (SOTA) schema linking method. Extensive experiments on Spider and BIRD benchmarks verify that KaSLA can significantly improve the SQL generation performance of SOTA Text2SQL models by substituting their schema linking processes. The code is available at https://github.com/DEEP-PolyU/KaSLA.
翻译:从用户查询生成SQL是一项长期挑战,其中初始模式链接的准确性会显著影响后续SQL生成的性能。然而,当前的模式链接模型仍面临相关模式元素缺失或冗余元素过多的问题。一个关键原因在于,常用的召回率和精确率指标未能捕捉到相关元素缺失,因而无法反映真实的模式链接性能。基于此,我们通过引入**受限缺失指标**提出了增强的模式链接度量。相应地,我们提出了**基于背包优化的模式链接方法(KaSLA)**,这是一种插件式模式链接方法,旨在防止相关模式元素缺失,同时最小化冗余元素的引入。KaSLA采用分层链接策略:首先确定最优表链接,随后在选定表内链接列,以缩小链接候选空间。在每个链接过程中,它利用背包优化方法链接潜在相关元素,同时考虑对潜在冗余元素的有限容忍度。借助这种优化,KaSLA-1.6B在模式链接结果上优于包括采用最先进(SOTA)模式链接方法的DeepSeek-V3在内的大规模大语言模型。在Spider和BIRD基准测试上的广泛实验证明,KaSLA可通过替换SOTA Text2SQL模型的模式链接过程显著提升其SQL生成性能。代码已开源至https://github.com/DEEP-PolyU/KaSLA。