The task of text-to-SQL aims to convert a natural language question into its corresponding SQL query within the context of relational tables. Existing text-to-SQL parsers generate a "plausible" SQL query for an arbitrary user question, thereby failing to correctly handle problematic user questions. To formalize this problem, we conduct a preliminary study on the observed ambiguous and unanswerable cases in text-to-SQL and summarize them into 6 feature categories. Correspondingly, we identify the causes behind each category and propose requirements for handling ambiguous and unanswerable questions. Following this study, we propose a simple yet effective counterfactual example generation approach that automatically produces ambiguous and unanswerable text-to-SQL examples. Furthermore, we propose a weakly supervised DTE (Detecting-Then-Explaining) model for error detection, localization, and explanation. Experimental results show that our model achieves the best result on both real-world examples and generated examples compared with various baselines. We release our data and code at: \href{https://github.com/wbbeyourself/DTE}{https://github.com/wbbeyourself/DTE}.
翻译:Text-to-SQL任务旨在将自然语言问题转换为关系表上下文中的对应SQL查询。现有的Text-to-SQL解析器会对任意用户问题生成"看似合理"的SQL查询,因而无法正确应对有问题的用户提问。为规范化这一问题,我们对Text-to-SQL中观察到的歧义与不可回答案例进行了初步研究,并将其归纳为6类特征范畴。相应地,我们明确了每类案例背后的成因,并提出了处理歧义与不可回答问题所需满足的要求。基于该研究,我们提出了一种简单而有效的反事实示例生成方法,可自动生成歧义与不可回答的Text-to-SQL示例。此外,我们还提出了一种弱监督的DTE(检测-然后-解释)模型,用于错误检测、定位与解释。实验结果表明,与多种基线方法相比,我们的模型在真实示例和生成示例上均取得了最优结果。我们已将数据和代码发布在:\href{https://github.com/wbbeyourself/DTE}{https://github.com/wbbeyourself/DTE}。