Large Language Models (LLMs) can translate natural language into SQL, but small models struggle with multi-table and complex queries in Zero-Shot Learning (ZSL) settings. While Supervised Fine-Tuning (SFT) helps, it falls short for harder cases. To address this, we study how different reasoning strategies (general-purpose reasoning in ZSL, reasoning traces in SFT, and Reinforcement Learning with Verifiable Reward (RLVR) with novel reward functions) affect Text2SQL performance across four benchmarks. We show that partial scoring rewards, computed via SQL execution, are crucial for guiding models even when outputs are not fully correct. These fine-grained signals lead to consistently better Text2SQL outcomes. Small LLMs benefit most from reasoning-aware SFT and RL, with the 14B Qwen-Coder-2.5 surpassing 400B+ models on challenging datasets like BIRD.
翻译:大语言模型(LLMs)可将自然语言转换为SQL语句,但在零样本学习(ZSL)场景下,小模型处理多表及复杂查询时表现欠佳。尽管监督微调(SFT)有所助益,但面对更复杂的案例仍显不足。为此,我们研究不同推理策略(ZSL中的通用推理、SFT中的推理轨迹、以及基于可验证奖励的强化学习(RLVR)配合新型奖励函数)在四个基准测试中对Text2SQL性能的影响。研究表明:通过SQL执行计算的部分评分奖励,即使输出不完全正确,也能有效引导模型。这种细粒度信号可稳定提升Text2SQL表现。小模型从推理感知型SFT和强化学习中获益最大,其中14B参数的Qwen-Coder-2.5在BIRD等具有挑战性的数据集上超越了400B参数以上的模型。