Large Language Models (LLMs) have demonstrated remarkable capabilities in translating natural language to SQL, yet existing methods still falter on complex queries requiring multi-step, data-aware reasoning. We introduce DecoSearch, a training-free framework that addresses this by routing each query to the appropriate level of reasoning effort. A lightweight Schema Selector first prunes the full database schema to the relevant tables and columns. An LLM Judger then decides whether the question requires decomposition: straightforward questions follow a direct generation path and complex ones are escalated to a Directed Acyclic Graph (DAG) of atomic sub-questions, each solved by a targeted SQL generation step. A RAG component grounds the decomposer with semantically similar training examples, and a Topology Refiner restructures the reasoning plan when execution failures signal a flawed decomposition rather than a fixable SQL error. DecoSearch achieves 70.53% execution accuracy on BIRD and 88.31% on Spider with a DeepSeek backbone, surpassing all training-free baselines while consuming an order of magnitude fewer tokens than competing methods. It also functions as a model-agnostic wrapper, consistently improving fine-tuned SQL generation backbones without any modification to the pipeline.
翻译:大型语言模型(LLMs)在将自然语言转换为SQL方面展现出卓越能力,但现有方法在面对需要多步数据感知推理的复杂查询时仍存在不足。我们提出DecoSearch,一种无需训练的框架,通过将每个查询路由至适当级别的推理努力来解决该问题。轻量级模式选择器(Schema Selector)首先将完整数据库模式精简至相关表和列。LLM判断器(LLM Judger)随后判定问题是否需要分解:简单问题遵循直接生成路径,复杂问题则升级至原子子问题的有向无环图(DAG),每个子问题通过针对性的SQL生成步骤解决。RAG组件利用语义相似的训练示例为分解器提供基础,当执行失败表明是错误分解而非可修复的SQL错误时,拓扑优化器(Topology Refiner)会重构推理计划。DecoSearch在BIRD数据集上达到70.53%的执行准确率,在Spider数据集上以DeepSeek为骨干达到88.31%,超越所有无训练基线方法,同时消耗的token数量比竞品方法少一个数量级。该框架还可作为模型无关的封装器,在无需修改流程的情况下持续提升经过微调的SQL生成骨干模型。