Large language models (LLMs) have advanced Text-to-SQL, yet existing solutions still fall short of system-level reliability. The limitation is not merely in individual modules - e.g., schema linking, reasoning, and verification - but more critically in the lack of structured orchestration that enforces correctness across the entire workflow. This gap motivates a paradigm shift: treating Text-to-SQL not as free-form language generation but as a software-engineering problem that demands structured, verifiable orchestration. We present DeepEye-SQL, a software-engineering-inspired framework that reframes Text-to-SQL as the development of a small software program, executed through a verifiable process guided by the Software Development Life Cycle (SDLC). DeepEye-SQL integrates four synergistic stages: it grounds ambiguous user intent through semantic value retrieval and robust schema linking; enhances fault tolerance with N-version SQL generation using diverse reasoning paradigms; ensures deterministic verification via a tool-chain of unit tests and targeted LLM-guided revision; and introduces confidence-aware selection that clusters execution results to estimate confidence and then takes a high-confidence shortcut or runs unbalanced pairwise adjudication in low-confidence cases, yielding a calibrated, quality-gated output. This SDLC-aligned workflow transforms ad hoc query generation into a disciplined engineering process. Using ~30B open-source LLMs without any fine-tuning, DeepEye-SQL achieves 73.5% execution accuracy on BIRD-Dev and 89.8% on Spider-Test, outperforming state-of-the-art solutions. This highlights that principled orchestration, rather than LLM scaling alone, is key to achieving system-level reliability in Text-to-SQL.
翻译:大型语言模型(LLMs)推动了文本到SQL(Text-to-SQL)的发展,但现有解决方案在系统级可靠性方面仍有不足。其局限不仅在于单个模块(例如模式链接、推理和验证),更关键的是缺乏能够确保整个工作流程正确性的结构化编排。这一差距促使我们进行范式转变:将文本到SQL任务不再视为自由形式的语言生成,而是作为一个需要结构化、可验证编排的软件工程问题。我们提出了DeepEye-SQL,这是一个受软件工程启发的框架,它将文本到SQL任务重新定义为一个小型软件程序的开发过程,并通过遵循软件开发生命周期(SDLC)指导的可验证流程来执行。DeepEye-SQL集成了四个协同阶段:通过语义值检索和鲁棒的模式链接来锚定模糊的用户意图;利用多样化的推理范式进行N版本SQL生成以增强容错性;通过包含单元测试和针对性LLM引导修订的工具链确保确定性验证;并引入置信度感知选择机制,该机制对执行结果进行聚类以估计置信度,随后在置信度高时采取高置信度捷径,或在置信度低时运行非平衡成对裁决,从而产生经过校准的、质量受控的输出。这种与SDLC对齐的工作流程将临时的查询生成转变为一种规范的工程过程。使用约300亿参数的开源LLMs且无需任何微调,DeepEye-SQL在BIRD-Dev数据集上实现了73.5%的执行准确率,在Spider-Test数据集上实现了89.8%的执行准确率,性能优于现有最先进的解决方案。这突显了,实现文本到SQL系统级可靠性的关键在于原则性的编排,而非仅仅依赖LLM的规模扩展。