Text-To-SQL (T2S) conversion based on large language models (LLMs) has found a wide range of applications, by leveraging the capabilities of LLMs in interpreting the query intent expressed in natural language. Existing research focuses on suitable representations for data schema and/or questions, task-specific instructions and representative examples, and complicated inference pipelines. All these methods are empirical and task specific, without a theoretical bound on performance. In this paper, we propose a simple, general, and performance guaranteed T2S enhancement approach called Actor-Critic (AC). Specifically, we design two roles using the same LLM: an Actor to produce SQL queries and a Critic to evaluate the produced SQL. If the Critic believes the produced SQL is wrong, it notifies the Actor to reproduce the SQL and perform evaluation again. By this simple iterative process, expected performance can be derived in theory. We conducted extensive experiments on the Spider and related datasets with eleven LLMs, and demonstrated that the Actor-Critic method consistently improves the performance of T2S, thus serving as a general enhancement approach for T2S conversion.
翻译:基于大语言模型(LLMs)的文本到SQL(T2S)转换通过利用LLMs理解自然语言所表达查询意图的能力,已获得广泛应用。现有研究主要关注数据模式和/或问题的合适表示、任务特定的指令与代表性示例,以及复杂的推理流程。这些方法均基于经验且针对特定任务,缺乏理论上的性能保证。本文提出一种简单、通用且具有性能保证的T2S增强方法,称为Actor-Critic(AC)。具体而言,我们使用同一个LLM设计两个角色:一个用于生成SQL查询的Actor,以及一个用于评估所生成SQL的Critic。若Critic判定生成的SQL有误,则通知Actor重新生成SQL并再次执行评估。通过这一简单的迭代过程,理论上可以推导出预期的性能。我们在Spider及相关数据集上使用十一种LLM进行了大量实验,结果表明Actor-Critic方法能持续提升T2S的性能,从而作为一种通用的T2S转换增强方法。