One of the recent best attempts at Text-to-SQL is the pre-trained language model. Due to the structural property of the SQL queries, the seq2seq model takes the responsibility of parsing both the schema items (i.e., tables and columns) and the skeleton (i.e., SQL keywords). Such coupled targets increase the difficulty of parsing the correct SQL queries especially when they involve many schema items and logic operators. This paper proposes a ranking-enhanced encoding and skeleton-aware decoding framework to decouple the schema linking and the skeleton parsing. Specifically, for a seq2seq encoder-decode model, its encoder is injected by the most relevant schema items instead of the whole unordered ones, which could alleviate the schema linking effort during SQL parsing, and its decoder first generates the skeleton and then the actual SQL query, which could implicitly constrain the SQL parsing. We evaluate our proposed framework on Spider and its three robustness variants: Spider-DK, Spider-Syn, and Spider-Realistic. The experimental results show that our framework delivers promising performance and robustness. Our code is available at https://github.com/RUCKBReasoning/RESDSQL.
翻译:文本到SQL领域近期最成功的尝试之一是预训练语言模型。由于SQL查询的结构特性,序列到序列模型需要同时承担模式项(即表和列)与骨架(即SQL关键词)的解析任务。这种耦合目标增加了正确解析SQL查询的难度,尤其是在涉及大量模式项和逻辑运算符时。本文提出了一种排序增强编码与骨架感知解码框架,用于解耦模式链接与骨架解析。具体而言,对于序列到序列编码器-解码器模型,其编码器仅注入最相关的模式项而非所有无序项,从而减轻SQL解析过程中的模式链接负担;解码器首先生成骨架,再生成实际SQL查询,从而隐式约束SQL解析。我们在Spider数据集及其三个鲁棒性变体(Spider-DK、Spider-Syn和Spider-Realistic)上评估了所提框架。实验结果表明,该框架具备优异的性能与鲁棒性。我们的代码已开源在https://github.com/RUCKBReasoning/RESDSQL。