We present SQL Query Engine, an open-source, self-hosted service that translates natural language questions into validated PostgreSQL queries through a two-stage LLM pipeline. The first stage performs automatic schema introspection and SQL generation; a multi-strategy response parser extracts SQL from any LLM output format (JSON, code blocks, or raw text) without requiring structured output APIs. The second stage executes the query against PostgreSQL and, upon failure or empty results, enters an iterative self-healing loop in which the LLM diagnoses the error using full SQLSTATE codes and PostgreSQL diagnostic messages. Two mechanisms prevent regressions: early-accept returns successful queries immediately without LLM re-evaluation, and best-result tracking preserves the best partial result across retries. Schema context is cached per session in Redis, progress events stream via Redis Pub/Sub and SSE, and an OpenAI-compatible /v1/chat/completions endpoint lets existing tools work without modification. All database connections are read-only at the driver level. We evaluate across five LLM backends on a synthetic benchmark (75 questions, three databases) where the self-healing loop yields up to +9.3pp accuracy gains with zero regressions on the best model (Llama 4 Scout 17B, 57.3%), and on BIRD (437 questions, 11 databases migrated from SQLite to PostgreSQL) where the full pipeline reaches 49.0% execution accuracy (GPT-OSS-120B, +4.6pp). Source code: https://github.com/codeadeel/sqlqueryengine.
翻译:我们提出SQL Query Engine,一个开源、自托管服务,通过两阶段LLM流水线将自然语言问题转化为经过验证的PostgreSQL查询。第一阶段执行自动模式自省与SQL生成;多策略响应解析器能从任意LLM输出格式(JSON、代码块或纯文本)中提取SQL,无需结构化输出API。第二阶段对PostgreSQL执行查询,在遇到失败或空结果时进入迭代自愈循环,该循环中LLM利用完整的SQLSTATE码和PostgreSQL诊断消息诊断错误。两种机制防止性能退化:早期接受机制无需LLM重新评估即直接返回成功查询,最佳结果追踪机制在重试过程中保留最佳部分结果。模式上下文按会话缓存在Redis中,进度事件通过Redis Pub/Sub和SSE流式传输,兼容OpenAI的/v1/chat/completions端点使现有工具无需修改即可运行。所有数据库连接在驱动层均为只读。我们在五个LLM后端上通过合成基准测试(75个问题、三个数据库)进行评估,自愈循环使最佳模型(Llama 4 Scout 17B,57.3%)获得高达+9.3个百分点的准确率提升且零退化;同时BIRD数据集(437个问题、11个从SQLite迁移至PostgreSQL的数据库)上,完整流水线达到49.0%的执行准确率(GPT-OSS-120B,+4.6个百分点)。源代码:https://github.com/codeadeel/sqlqueryengine。