GenEdit：复合操作符与持续改进机制应对企业级文本到SQL转换挑战 (GenEdit: Compounding Operators and Continuous Improvement to Tackle Text-to-SQL in the Enterprise)

Recent advancements in Text-to-SQL, driven by large language models, are democratizing data access. Despite these advancements, enterprise deployments remain challenging due to the need to capture business-specific knowledge, handle complex queries, and meet expectations of continuous improvements. To address these issues, we designed and implemented GenEdit: our Text-to-SQL generation system that improves with user feedback. GenEdit builds and maintains a company-specific knowledge set, employs a pipeline of operators decomposing SQL generation, and uses feedback to update its knowledge set to improve future SQL generations. We describe GenEdit's architecture made of two core modules: (i) decomposed SQL generation; and (ii) knowledge set edits based on user feedback. For generation, GenEdit leverages compounding operators to improve knowledge retrieval and to create a plan as chain-of-thought steps that guides generation. GenEdit first retrieves relevant examples in an initial retrieval stage where original SQL queries are decomposed into sub-statements, clauses or sub-queries. It then also retrieves instructions and schema elements. Using the retrieved contextual information, GenEdit then generates step-by-step plan in natural language on how to produce the query. Finally, GenEdit uses the plan to generate SQL, minimizing the need for model reasoning, which enhances complex SQL generation. If necessary, GenEdit regenerates the query based on syntactic and semantic errors. The knowledge set edits are recommended through an interactive copilot, allowing users to iterate on their feedback and to regenerate SQL queries as needed. Each generation uses staged edits which update the generation prompt. Once the feedback is submitted, it gets merged after passing regression testing and obtaining an approval, improving future generations.

翻译：近期，大型语言模型驱动的文本到SQL技术进展正在推动数据访问的民主化。尽管取得了这些进步，企业级部署仍然面临挑战，原因在于需要捕获业务特定知识、处理复杂查询并满足持续改进的期望。为解决这些问题，我们设计并实现了GenEdit：一个能够通过用户反馈持续改进的文本到SQL生成系统。GenEdit构建并维护企业专属知识集，采用分解SQL生成的操作符流水线，并利用反馈更新知识集以优化后续SQL生成。我们描述了GenEdit由两个核心模块组成的架构：（i）分解式SQL生成；（ii）基于用户反馈的知识集编辑。在生成阶段，GenEdit利用复合操作符改进知识检索，并创建作为思维链步骤的生成计划以指导生成过程。系统首先在初始检索阶段将原始SQL查询分解为子语句、子句或子查询以检索相关示例，同时检索指令与模式元素。利用检索到的上下文信息，GenEdit随后以自然语言生成分步执行计划来指导查询构建。最后，系统依据计划生成SQL，最大限度减少模型推理需求，从而提升复杂SQL的生成质量。必要时，GenEdit会根据语法和语义错误重新生成查询。知识集编辑通过交互式协同编程助手推荐实现，允许用户迭代反馈并根据需要重新生成SQL查询。每次生成采用分阶段编辑机制更新生成提示。反馈提交后，在通过回归测试并获得批准后合并至知识库，从而持续优化后续生成结果。