Reliable Text-to-SQL with Adaptive Abstention

Large language models (LLMs) have revolutionized natural language interfaces for databases, particularly in text-to-SQL conversion. However, current approaches often generate unreliable outputs when faced with ambiguity or insufficient context. We present Reliable Text-to-SQL (RTS), a novel framework that enhances query generation reliability by incorporating abstention and human-in-the-loop mechanisms. RTS focuses on the critical schema linking phase, which aims to identify the key database elements needed for generating SQL queries. It autonomously detects potential errors during the answer generation process and responds by either abstaining or engaging in user interaction. A vital component of RTS is the Branching Point Prediction (BPP) which utilizes statistical conformal techniques on the hidden layers of the LLM model for schema linking, providing probabilistic guarantees on schema linking accuracy. We validate our approach through comprehensive experiments on the BIRD benchmark, demonstrating significant improvements in robustness and reliability. Our findings highlight the potential of combining transparent-box LLMs with human-in-the-loop processes to create more robust natural language interfaces for databases. For the BIRD benchmark, our approach achieves near-perfect schema linking accuracy, autonomously involving a human when needed. Combined with query generation, we demonstrate that near-perfect schema linking and a small query generation model can almost match SOTA accuracy achieved with a model orders of magnitude larger than the one we use.

翻译：大型语言模型（LLM）彻底改变了数据库的自然语言接口，特别是在文本到SQL转换领域。然而，当前方法在面对歧义或上下文不足时常常生成不可靠的输出。我们提出了可靠文本到SQL（RTS）框架，这是一种通过结合弃权机制和人机协同机制来提升查询生成可靠性的新型框架。RTS聚焦于关键的模式链接阶段，该阶段旨在识别生成SQL查询所需的核心数据库元素。它能够在答案生成过程中自主检测潜在错误，并通过弃权或启动用户交互进行响应。RTS的核心组件是分支点预测（BPP），该组件利用统计保形技术对LLM模型隐藏层在模式链接过程中的输出进行分析，为模式链接的准确性提供概率性保证。我们在BIRD基准测试上通过全面实验验证了我们的方法，结果表明其在鲁棒性和可靠性方面均有显著提升。我们的研究凸显了将透明化LLM与人机协同流程相结合，以构建更鲁棒的数据库自然语言接口的潜力。在BIRD基准测试中，我们的方法实现了近乎完美的模式链接准确率，并在需要时自主引入人工干预。结合查询生成，我们证明了近乎完美的模式链接与一个小型查询生成模型相结合，几乎可以达到比我们所使用模型大数个数量级的模型才能实现的SOTA准确率。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日