Recent advances in LLM-based Text-to-SQL have achieved remarkable gains on public benchmarks such as BIRD and Spider. Yet, these systems struggle to scale in realistic enterprise settings with large, complex schemas, diverse SQL dialects, and expensive multi-step reasoning. Emerging agentic approaches show potential for adaptive reasoning but often suffer from inefficiency and instability-repeating interactions with databases, producing inconsistent outputs, and occasionally failing to generate valid answers. To address these challenges, we introduce Agent Semantic Memory (AgentSM), an agentic framework for Text-to-SQL that builds and leverages interpretable semantic memory. Instead of relying on raw scratchpads or vector retrieval, AgentSM captures prior execution traces-or synthesizes curated ones-as structured programs that directly guide future reasoning. This design enables systematic reuse of reasoning paths, which allows agents to scale to larger schemas, more complex questions, and longer trajectories efficiently and reliably. Compared to state-of-the-art systems, AgentSM achieves higher efficiency by reducing average token usage and trajectory length by 25% and 35%, respectively, on the Spider 2.0 benchmark. It also improves execution accuracy, reaching a state-of-the-art accuracy of 44.8% on the Spider 2.0 Lite benchmark.
翻译:近期基于大语言模型(LLM)的文本到SQL研究在BIRD、Spider等公开基准测试中取得了显著进展。然而,这些系统在真实企业环境中难以扩展,这些环境通常具有庞大复杂的数据库模式、多样化的SQL方言以及昂贵的多步推理需求。新兴的智能体方法展现出适应性推理的潜力,但常受效率低下与不稳定性困扰——包括重复与数据库交互、生成不一致的输出,以及偶尔无法产生有效答案。为应对这些挑战,我们提出了智能体语义记忆(AgentSM),这是一个用于文本到SQL的智能体框架,它构建并利用可解释的语义记忆。AgentSM不依赖原始草稿或向量检索,而是将先前的执行轨迹(或合成的精选轨迹)捕获为结构化程序,直接指导未来的推理。这一设计实现了推理路径的系统性复用,使智能体能够高效可靠地扩展到更大的数据库模式、更复杂的问题以及更长的推理轨迹。与最先进的系统相比,在Spider 2.0基准测试中,AgentSM通过将平均令牌使用量和轨迹长度分别降低25%和35%,实现了更高的效率。同时,其执行准确率也得到提升,在Spider 2.0 Lite基准测试中达到了44.8%的最新最优准确率。