The NL2SQL task involves parsing natural language statements into SQL queries. While most state-of-the-art methods treat NL2SQL as a slot-filling task and use feature representation learning techniques, they overlook explicit correlation features between the SELECT and WHERE clauses and implicit correlation features between sub-tasks within a single clause. To address this issue, we propose the Clause Feature Correlation Decoupling and Coupling (CFCDC) model, which uses a feature representation decoupling method to separate the SELECT and WHERE clauses at the parameter level. Next, we introduce a multi-task learning architecture to decouple implicit correlation feature representation between different SQL tasks in a specific clause. Moreover, we present an improved feature representation coupling module to integrate the decoupled tasks in the SELECT and WHERE clauses and predict the final SQL query. Our proposed CFCDC model demonstrates excellent performance on the WikiSQL dataset, with significant improvements in logic precision and execution accuracy. The source code for the model will be publicly available on GitHub
翻译:NL2SQL任务涉及将自然语言语句解析为SQL查询语句。尽管大多数先进方法将NL2SQL视为槽填充任务并采用特征表示学习技术,但它们忽略了SELECT子句与WHERE子句之间的显式关联特征,以及同一子句内子任务之间的隐式关联特征。为解决该问题,本文提出子句特征关联解耦与耦合(CFCDC)模型,该模型采用特征表示解耦方法在参数层面分离SELECT和WHERE子句。其次,引入多任务学习架构,以解耦特定子句中不同SQL任务间的隐式关联特征表示。此外,我们提出改进的特征表示耦合模块,用于整合SELECT和WHERE子句中解耦后的任务,并预测最终SQL查询语句。所提出的CFCDC模型在WikiSQL数据集上展现出优异性能,在逻辑准确率和执行准确率方面均有显著提升。本模型的源代码将在GitHub上公开。