Text-to-SQL parsing has achieved remarkable progress under the Full Schema Assumption. However, this premise fails in real-world enterprise environments where databases contain hundreds of tables with massive noisy metadata. Rather than injecting the full schema upfront, an agent must actively identify and verify only the relevant subset, giving rise to the Unknown Schema scenario we study in this work. To address this, we propose TRUST-SQL (Truthful Reasoning with Unknown Schema via Tools). We formulate the task as a Partially Observable Markov Decision Process where our autonomous agent employs a structured four-phase protocol to ground reasoning in verified metadata. Crucially, this protocol provides a structural boundary for our novel Dual-Track GRPO strategy. By applying token-level masked advantages, this strategy isolates exploration rewards from execution outcomes to resolve credit assignment, yielding a 9.9% relative improvement over standard GRPO. Extensive experiments across five benchmarks demonstrate that TRUST-SQL achieves an average absolute improvement of 30.6% and 16.6% for the 4B and 8B variants respectively over their base models. Remarkably, despite operating entirely without pre-loaded metadata, our framework consistently matches or surpasses strong baselines that rely on schema prefilling.
翻译:文本到SQL解析在完整模式假设下已取得显著进展。然而,这一前提在现实企业环境中并不成立,因为数据库通常包含数百个具有大量噪声元数据的表。智能体必须主动识别并仅验证相关子集,而非预先注入完整模式,由此产生了我们在本工作中研究的未知模式场景。为解决此问题,我们提出了TRUST-SQL(通过工具进行未知模式下的真实推理)。我们将该任务建模为部分可观测马尔可夫决策过程,其中自主智能体采用结构化的四阶段协议,将推理过程锚定于已验证的元数据。至关重要的是,该协议为我们新颖的双轨GRPO策略提供了结构化边界。通过应用令牌级掩码优势函数,该策略将探索奖励与执行结果解耦以解决信用分配问题,相比标准GRPO实现了9.9%的相对性能提升。在五个基准测试上的大量实验表明,TRUST-SQL的4B和8B变体相比其基础模型分别实现了30.6%和16.6%的平均绝对性能提升。值得注意的是,尽管完全无需预加载元数据,我们的框架始终达到甚至超越了依赖模式预填充的强基线模型性能。