Large language models (LLMs) have advanced Text-to-SQL, yet existing solutions still fall short of system-level reliability. The limitation is not merely in individual modules -- e.g., schema linking, reasoning, and verification -- but more critically in the lack of structured orchestration that enforces correctness across the entire workflow. This gap motivates a paradigm shift: treating Text-to-SQL not as free-form language generation but as a software-engineering problem that demands structured, verifiable orchestration. We present DeepEye-SQL, a software-engineering-inspired framework that reframes Text-to-SQL as the development of a small software program, executed through a verifiable process guided by the Software Development Life Cycle (SDLC). DeepEye-SQL integrates four synergistic stages: it grounds user intent through robust schema linking, enforcing relational closure; enhances fault tolerance with N-version SQL generation; ensures deterministic verification via a ``Syntax-Logic-Quality'' tool-chain that intercepts errors pre-execution; and introduces confidence-aware selection that leverages execution-guided adjudication to resolve ambiguity beyond simple majority voting. Leveraging open-source MoE LLMs (~30B total, ~3B activated parameters) without any fine-tuning, DeepEye-SQL achieves 73.5% execution accuracy on BIRD-Dev, 75.07% on the official BIRD-Test leaderboard, and 89.8% on Spider-Test, outperforming state-of-the-art solutions that rely on larger models or extensive training. This highlights that principled orchestration, rather than LLM scaling alone, is key to achieving system-level reliability in Text-to-SQL.
翻译:大语言模型(LLMs)推动了Text-to-SQL技术的发展,然而现有解决方案在系统级可靠性方面仍存在不足。其局限性不仅体现在模式链接、推理与验证等单个模块上,更关键之处在于缺乏能够确保整个工作流正确性的结构化编排。这一差距推动了范式转变:将Text-to-SQL视为一个要求结构化且可验证编排的软件工程问题,而非自由形式的语言生成任务。我们提出DeepEye-SQL,这是一个受软件工程启发的框架,它将Text-to-SQL重新定义为开发一个小型软件程序的过程,并通过遵循软件开发生命周期(SDLC)的可验证流程来执行。DeepEye-SQL集成了四个协同阶段:通过鲁棒的模式链接(强制关系闭包)来锚定用户意图;通过N版本SQL生成增强容错性;借助“语法-逻辑-质量”工具链在执行前拦截错误,实现确定性验证;并引入置信度感知选择机制,利用执行导向的裁决来解决简单多数投票无法处理的歧义。借助开源的MoE大语言模型(总参数量约30B,激活参数量约3B)且无需任何微调,DeepEye-SQL在BIRD-Dev上实现了73.5%的执行准确率,在官方BIRD-Test排行榜上达到75.07%,在Spider-Test上达到89.8%,超越了依赖更大模型或大量训练的最先进解决方案。这凸显了在Text-to-SQL中,实现系统级可靠性的关键在于原则性编排,而非单纯扩大LLM规模。