While large language models (LLMs) have substantially improved Text-to-SQL generation, a pronounced gap remains between AI systems and human experts on challenging benchmarks such as BIRD-SQL. We argue this gap stems largely from the prevailing single-pass paradigm, which lacks the iterative reasoning, schema exploration, and error-correction behaviors that humans naturally employ. To address this limitation, we introduce SQL-Trail, a multi-turn reinforcement learning (RL) agentic framework for Text-to-SQL. Rather than producing a query in one shot, SQL-Trail interacts with the database environment and uses execution feedback to iteratively refine its predictions. Our approach centers on two key ideas: (i) an adaptive turn-budget allocation mechanism that scales the agent's interaction depth to match question difficulty, and (ii) a composite reward panel that jointly incentivizes SQL correctness and efficient exploration. Across benchmarks, SQL-Trail sets a new state of the art and delivers strong data efficiency--up to 18x higher than prior single-pass RL state-of-the-art methods. Notably, our 7B and 14B models outperform substantially larger proprietary systems by 5% on average, underscoring the effectiveness of interactive, agentic workflows for robust Text-to-SQL generation.
翻译:尽管大型语言模型(LLMs)显著提升了文本到SQL的生成能力,但在BIRD-SQL等具有挑战性的基准测试中,人工智能系统与人类专家之间仍存在明显差距。我们认为这一差距主要源于当前主流的单次生成范式,其缺乏人类自然运用的迭代推理、模式探索和错误修正行为。为突破这一局限,我们提出了SQL-Trail——一个面向文本到SQL的多轮强化学习智能体框架。该方法并非一次性生成查询,而是通过与数据库环境交互并利用执行反馈迭代优化预测结果。我们的核心创新在于:(1)自适应轮次预算分配机制,可根据问题难度动态调整智能体交互深度;(2)复合奖励面板,协同激励SQL正确性与探索效率。在多项基准测试中,SQL-Trail实现了新的最优性能,并展现出卓越的数据效率——较先前单轮强化学习最优方法提升高达18倍。值得注意的是,我们的7B和14B模型平均性能超越规模大得多的商用系统5%,这凸显了交互式智能体工作流对构建鲁棒文本到SQL生成系统的有效性。