Research in Text-to-SQL conversion has been largely benchmarked against datasets where each text query corresponds to one correct SQL. However, natural language queries over real-life databases frequently involve significant ambiguity about the intended SQL due to overlapping schema names and multiple confusing relationship paths. To bridge this gap, we develop a novel benchmark called AmbiQT with over 3000 examples where each text is interpretable as two plausible SQLs due to lexical and/or structural ambiguity. When faced with ambiguity, an ideal top-$k$ decoder should generate all valid interpretations for possible disambiguation by the user. We evaluate several Text-to-SQL systems and decoding algorithms, including those employing state-of-the-art LLMs, and find them to be far from this ideal. The primary reason is that the prevalent beam search algorithm and its variants, treat SQL queries as a string and produce unhelpful token-level diversity in the top-$k$. We propose LogicalBeam, a new decoding algorithm that navigates the SQL logic space using a blend of plan-based template generation and constrained infilling. Counterfactually generated plans diversify templates while in-filling with a beam-search that branches solely on schema names provides value diversity. LogicalBeam is up to $2.5$ times more effective than state-of-the-art models at generating all candidate SQLs in the top-$k$ ranked outputs. It also enhances the top-$5$ Exact and Execution Match Accuracies on SPIDER and Kaggle DBQA.
翻译:文本到SQL转换的研究主要基于每一条文本查询对应一个正确SQL的数据集进行基准测试。然而,现实数据库中自然语言查询常因模式名称重叠和多个令人困惑的关系路径,在预期SQL方面涉及显著模糊性。为填补这一空白,我们开发了一个名为AmbiQT的新基准,包含3000多个示例,其中每条文本因词法和/或结构歧义可解释为两个合理的SQL。面对模糊性时,理想的top-$k$解码器应生成所有有效解释,供用户进行可能的消歧。我们评估了多个文本到SQL系统及解码算法,包括采用最先进大语言模型的方法,发现它们远未达到这一理想状态。主要原因在于,普遍使用的波束搜索算法及其变体将SQL查询视为字符串,在top-$k$中产生了无益的令牌级多样性。我们提出LogicalBeam,一种新的解码算法,它融合了基于计划的模板生成与约束填充,在SQL逻辑空间中进行导航。反事实生成的计划实现模板的多样性,而仅对模式名称进行分支的波束搜索填充则提供值的多样性。在生成top-$k$排序输出中的所有候选SQL方面,LogicalBeam的有效性比最先进模型高出2.5倍。它还在SPIDER和Kaggle DBQA上提升了top-5精确匹配和执行匹配准确率。