Large Language Models (LLMs) have demonstrated strong performance on various tasks. To unleash their power on the Text-to-SQL task, we propose $R^3$ (Review-Rebuttal-Revision), a consensus-based multi-agent system for Text-to-SQL tasks. $R^3$ outperforms the existing single LLM Text-to-SQL systems as well as the multi-agent Text-to-SQL systems by $1.3\%$ to $8.1\%$ on Spider and Bird. Surprisingly, we find that for Llama-3-8B, $R^3$ outperforms chain-of-thought prompting by over 20\%, even outperforming GPT-3.5 on the development set of Spider.
翻译:大型语言模型(LLM)已在多种任务中展现出强大性能。为充分发挥其在文本到SQL任务中的潜力,我们提出了R^3(审阅-辩驳-修订),一种基于共识的多智能体文本到SQL系统。在Spider和Bird数据集上,R^3相较于现有单LLM文本到SQL系统以及多智能体文本到SQL系统的性能提升达到1.3%至8.1%。令人惊讶的是,我们发现对于Llama-3-8B模型,R^3比思维链提示方法的性能高出20%以上,甚至在Spider开发集上超越了GPT-3.5的表现。