Previous text-to-SQL datasets and systems have primarily focused on user questions with clear intentions that can be answered. However, real user questions can often be ambiguous with multiple interpretations or unanswerable due to a lack of relevant data. In this work, we construct a practical conversational text-to-SQL dataset called PRACTIQ, consisting of ambiguous and unanswerable questions inspired by real-world user questions. We first identified four categories of ambiguous questions and four categories of unanswerable questions by studying existing text-to-SQL datasets. Then, we generate conversations with four turns: the initial user question, an assistant response seeking clarification, the user's clarification, and the assistant's clarified SQL response with the natural language explanation of the execution results. For some ambiguous queries, we also directly generate helpful SQL responses, that consider multiple aspects of ambiguity, instead of requesting user clarification. To benchmark the performance on ambiguous, unanswerable, and answerable questions, we implemented large language model (LLM)-based baselines using various LLMs. Our approach involves two steps: question category classification and clarification SQL prediction. Our experiments reveal that state-of-the-art systems struggle to handle ambiguous and unanswerable questions effectively. We will release our code for data generation and experiments on GitHub.
翻译:以往的文本到SQL数据集与系统主要关注意图清晰且可回答的用户问题。然而,真实场景中的用户问题往往因存在多种解释而具有模糊性,或因缺乏相关数据而无法回答。在本工作中,我们构建了一个名为PRACTIQ的实用对话式文本到SQL数据集,其中包含受真实用户问题启发而产生的模糊与不可回答问题。我们首先通过研究现有文本到SQL数据集,识别出四类模糊问题与四类不可回答问题。随后,我们生成了包含四个轮次的对话:初始用户问题、寻求澄清的助手回复、用户的澄清说明,以及助手在澄清后给出的SQL回复及其执行结果的自然语言解释。对于部分模糊查询,我们也会直接生成考虑模糊性多个方面的有益SQL回复,而非请求用户澄清。为了评估模型在模糊、不可回答及可回答问题上的性能,我们基于多种大语言模型(LLM)实现了基线方法。我们的方法包含两个步骤:问题类别分类与澄清式SQL预测。实验结果表明,现有先进系统难以有效处理模糊与不可回答问题。我们将在GitHub上发布数据生成与实验的相关代码。