Multi-turn Text-to-SQL aims to translate a user's conversational utterances into executable SQL while preserving dialogue coherence and grounding to the target schema. However, most existing systems only regard this task as a simple text translation task and follow a short-horizon paradigm, generating a query per turn without execution, explicit verification, and refinement, which leads to non-executable or incoherent outputs. We present MTSQL-R1, an agentic training framework for long-horizon multi-turn Text-to-SQL. We cast the task as a Markov Decision Process (MDP) in which an agent interacts with (i) a database for execution feedback and (ii) a persistent dialogue memory for coherence verification, performing an iterative propose to execute -> verify -> refine cycle until all checks pass. Experiments on COSQL and SPARC demonstrate that MTSQL-R1 consistently outperforms strong baselines, highlighting the importance of environment-driven verification and memory-guided refinement for conversational semantic parsing. Full recipes (including code, trained models, logs, reasoning trajectories, etc.) will be released after the internal review to contribute to community research.
翻译:多轮文本到SQL任务旨在将用户的对话式表述转换为可执行的SQL语句,同时保持对话连贯性并锚定目标数据库模式。然而,现有系统大多仅将其视为简单的文本翻译任务,遵循短程范式——每轮生成查询语句而不执行、显式验证或优化,这常导致输出不可执行或不连贯。本文提出MTSQL-R1,一种面向长程多轮文本到SQL的智能体训练框架。我们将该任务构建为马尔可夫决策过程,其中智能体与(1)提供执行反馈的数据库及(2)用于连贯性验证的持久对话记忆进行交互,执行“生成→执行→验证→优化”的迭代循环直至通过所有检查。在COSQL和SPARC数据集上的实验表明,MTSQL-R1持续超越现有强基线方法,凸显了环境驱动验证与记忆引导优化在对话式语义解析中的重要性。完整实施方案(包括代码、训练模型、日志、推理轨迹等)将在内部评审后开源,以促进学术社区研究。