Large language models (LLMs) have revolutionized natural language interfaces for databases, particularly in text-to-SQL conversion. However, current approaches often generate unreliable outputs when faced with ambiguity or insufficient context. We present Reliable Text-to-SQL (RTS), a novel framework that enhances query generation reliability by incorporating abstention and human-in-the-loop mechanisms. RTS focuses on the critical schema linking phase, which aims to identify the key database elements needed for generating SQL queries. It autonomously detects potential errors during the answer generation process and responds by either abstaining or engaging in user interaction. A vital component of RTS is the Branching Point Prediction (BPP) which utilizes statistical conformal techniques on the hidden layers of the LLM model for schema linking, providing probabilistic guarantees on schema linking accuracy. We validate our approach through comprehensive experiments on the BIRD benchmark, demonstrating significant improvements in robustness and reliability. Our findings highlight the potential of combining transparent-box LLMs with human-in-the-loop processes to create more robust natural language interfaces for databases. For the BIRD benchmark, our approach achieves near-perfect schema linking accuracy, autonomously involving a human when needed. Combined with query generation, we demonstrate that near-perfect schema linking and a small query generation model can almost match SOTA accuracy achieved with a model orders of magnitude larger than the one we use.
翻译:大型语言模型(LLM)彻底改变了数据库的自然语言接口,特别是在文本到SQL转换领域。然而,当前方法在面对歧义或上下文不足时常常生成不可靠的输出。我们提出了可靠文本到SQL(RTS)框架,这是一种通过结合弃权机制和人机协同机制来提升查询生成可靠性的新型框架。RTS聚焦于关键的模式链接阶段,该阶段旨在识别生成SQL查询所需的核心数据库元素。它能够在答案生成过程中自主检测潜在错误,并通过弃权或启动用户交互进行响应。RTS的核心组件是分支点预测(BPP),该组件利用统计保形技术对LLM模型隐藏层在模式链接过程中的输出进行分析,为模式链接的准确性提供概率性保证。我们在BIRD基准测试上通过全面实验验证了我们的方法,结果表明其在鲁棒性和可靠性方面均有显著提升。我们的研究凸显了将透明化LLM与人机协同流程相结合,以构建更鲁棒的数据库自然语言接口的潜力。在BIRD基准测试中,我们的方法实现了近乎完美的模式链接准确率,并在需要时自主引入人工干预。结合查询生成,我们证明了近乎完美的模式链接与一个小型查询生成模型相结合,几乎可以达到比我们所使用模型大数个数量级的模型才能实现的SOTA准确率。