Text-to-SQL over large analytical databases requires navigating complex schemas, resolving ambiguous queries, and grounding decisions in actual data. Most current systems follow a fixed pipeline where schema elements are retrieved once upfront and the database is only revisited for post-hoc repair, limiting recovery from early mistakes. We present FlexSQL, a text-to-SQL agent whose core design principle is flexible database interaction: the agent can explore schema structure, inspect data values, and run verification queries at any point during reasoning. FlexSQL generates diverse execution plans to cover multiple query interpretations, implements each plan in either SQL or Python depending on the task, and uses a two-tiered repair mechanism that can backtrack from code-level errors to plan-level revisions. On Spider2-Snow, using gpt-oss-120b, FlexSQL achieves a 65.4\% score, outperforming strong open-source baselines that use stronger, larger models such as gpt-o3 and DeepSeek-R1. When integrated into a general-purpose coding agent (as skills in Claude Code), our approach yields over 10\% relative improvement on Spider2-Snow. Further analysis shows that flexible exploration and flexible execution jointly contribute to the effectiveness of our approach, highlighting flexibility as a key design principle. Our code is available at: https://github.com/StringNLPLAB/FlexSQL
翻译:针对大型分析数据库的Text-to-SQL任务需要遍历复杂模式、消解歧义查询,并将决策基于实际数据进行。当前大多数系统遵循固定流程:模式元素仅预先一次性检索,数据库仅在后处理修复阶段被再次访问,从而限制了从早期错误中恢复的能力。我们提出FlexSQL——一款以灵活数据库交互为核心设计原则的Text-to-SQL智能体:该智能体可在推理过程中的任意时刻探索模式结构、检查数据值并执行验证查询。FlexSQL生成多样化的执行计划以覆盖多义查询解释,根据任务类型通过SQL或Python实现各计划,并采用双层修复机制,实现从代码级错误回溯至计划级修正。在Spider2-Snow基准上,使用gpt-oss-120b模型,FlexSQL取得了65.4%的得分,超越了使用更强更大模型(如gpt-o3与DeepSeek-R1)的强开源基线。当集成至通用编码智能体(作为Claude Code中的技能)时,本方法在Spider2-Snow上获得了超过10%的相对性能提升。进一步分析表明,灵活探索与灵活执行共同促进了本方法的有效性,突显了灵活性的关键设计原则地位。我们的代码开源于:https://github.com/StringNLPLAB/FlexSQL