Conventional text-to-SQL parsers are not good at synthesizing complex SQL queries that involve multiple tables or columns, due to the challenges inherent in identifying the correct schema items and performing accurate alignment between question and schema items. To address the above issue, we present a schema-aware multi-task learning framework (named MTSQL) for complicated SQL queries. Specifically, we design a schema linking discriminator module to distinguish the valid question-schema linkings, which explicitly instructs the encoder by distinctive linking relations to enhance the alignment quality. On the decoder side, we define 6-type relationships to describe the connections between tables and columns (e.g., WHERE_TC), and introduce an operator-centric triple extractor to recognize those associated schema items with the predefined relationship. Also, we establish a rule set of grammar constraints via the predicted triples to filter the proper SQL operators and schema items during the SQL generation. On Spider, a cross-domain challenging text-to-SQL benchmark, experimental results indicate that MTSQL is more effective than baselines, especially in extremely hard scenarios. Moreover, further analyses verify that our approach leads to promising improvements for complicated SQL queries.
翻译:传统文本到SQL解析器不擅长合成涉及多表或多列的复杂SQL查询,这是由于识别正确模式条目以及实现问题与模式条目间精确对齐的固有挑战。为应对上述问题,我们提出一种面向复杂SQL查询的模式感知多任务学习框架(命名为MTSQL)。具体而言,我们设计了一个模式链接判别器模块来区分有效的问题-模式链接,通过差异化链接关系显式指导编码器以增强对齐质量。在解码端,我们定义了6种关系类型来描述表与列之间的关联(如WHERE_TC),并引入基于操作符的三元组抽取器来识别具有预定义关系的相关模式条目。此外,我们通过预测的三元组建立语法约束规则集,以在SQL生成过程中过滤合理的SQL操作符和模式条目。在跨域挑战性文本到SQL基准Spider上的实验结果表明,MTSQL比基线方法更有效,尤其在极高难度场景下。进一步分析验证了我们的方法为复杂SQL查询带来了显著改进。