Multi-turn Text-to-SQL aims to translate a user's conversational utterances into executable SQL while preserving dialogue coherence and grounding to the target schema. However, most existing systems only regard this task as a simple text translation task and follow a short-horizon paradigm, generating a query per turn without execution, explicit verification, and refinement, which leads to non-executable or incoherent outputs. We present MTSQL-R1, an agentic training framework for long-horizon multi-turn Text-to-SQL. We cast the task as a Markov Decision Process (MDP) in which an agent interacts with (i) a database for execution feedback and (ii) a persistent dialogue memory for coherence verification, performing an iterative propose to execute -> verify -> refine cycle until all checks pass. Experiments on COSQL and SPARC demonstrate that MTSQL-R1 consistently outperforms strong baselines, highlighting the importance of environment-driven verification and memory-guided refinement for conversational semantic parsing. Full recipes (including code, trained models, logs, reasoning trajectories, etc.) will be released after the internal review to contribute to community research.
翻译:多轮文本到SQL旨在将用户的对话语句转换为可执行的SQL查询,同时保持对话连贯性并关联目标数据库模式。然而,现有系统大多仅将此项任务视为简单的文本翻译任务,并遵循短程范式——每轮生成查询而不进行执行、显式验证和修正,导致输出结果不可执行或缺乏连贯性。我们提出MTSQL-R1,一种面向长程多轮文本到SQL的智能体训练框架。该任务被建模为马尔可夫决策过程,其中智能体通过与数据库交互获取执行反馈,并利用持久化对话记忆进行连贯性验证,执行“提出→执行→验证→修正”的迭代循环,直至所有检查通过。在COSQL和SPARC上的实验表明,MTSQL-R1持续优于强基线模型,凸显了环境驱动验证和记忆引导修正对对话语义解析的重要性。完整方案(包括代码、训练模型、日志、推理轨迹等)将在内部审查后发布,以促进社区研究。