Despite the remarkable performance of large language models (LLMs) in text-to-SQL (SQL generation), correctly producing SQL queries remains challenging during initial generation. The SQL refinement task is subsequently introduced to correct syntactic and semantic errors in generated SQL queries. However, existing paradigms face two major limitations: (i) self-debugging becomes increasingly ineffective as modern LLMs rarely produce explicit execution errors that can trigger debugging signals; (ii) self-correction exhibits low detection precision due to the lack of explicit error modeling grounded in the question and schema, and suffers from severe hallucination that frequently corrupts correct SQLs. In this paper, we propose ErrorLLM, a framework that explicitly models text-to-SQL Errors within a dedicated LLM for text-to-SQL refinement. Specifically, we represent the user question and database schema as structural features, employ static detection to identify execution failures and surface mismatches, and extend ErrorLLM's semantic space with dedicated error tokens that capture categorized implicit semantic error types. Through a well-designed training strategy, we explicitly model these errors with structural representations, enabling the LLM to detect complex implicit errors by predicting dedicated error tokens. Guided by the detected errors, we perform error-guided refinement on the SQL structure by prompting LLMs. Extensive experiments demonstrate that ErrorLLM achieves the most significant improvements over backbone initial generation. Further analysis reveals that detection quality directly determines refinement effectiveness, and ErrorLLM addresses both sides by high detection F1 score while maintain refinement effectiveness.
翻译:尽管大型语言模型(LLM)在文本到SQL(SQL生成)任务中展现出卓越性能,但在初始生成阶段正确生成SQL查询仍具挑战性。SQL优化任务随之被提出,以修正生成SQL查询中的语法与语义错误。然而,现有范式面临两大局限:(i)由于现代LLM极少产生可触发调试信号的显式执行错误,自调试机制逐渐失效;(ii)自校正方法因缺乏基于问题与数据库模式的显式错误建模而检测精度不足,同时存在严重幻觉现象,频繁破坏原本正确的SQL语句。本文提出ErrorLLM框架,通过专用LLM显式建模文本到SQL过程中的错误以实现SQL优化。具体而言,我们将用户问题与数据库模式转化为结构化特征,采用静态检测识别执行失败与表层不匹配问题,并通过专用错误标记扩展ErrorLLM的语义空间以捕获分类化的隐式语义错误类型。通过精心设计的训练策略,我们利用结构化表示显式建模这些错误,使LLM能够通过预测专用错误标记来检测复杂的隐式错误。在检测到的错误指导下,我们通过提示LLM对SQL结构执行错误引导的优化。大量实验表明,ErrorLLM相较于骨干初始生成模型实现了最显著的性能提升。进一步分析揭示,检测质量直接决定优化效果,而ErrorLLM通过高检测F1分数同时保持优化效能,实现了双重突破。