Recent LLM-based Text-to-SQL methods usually suffer from significant performance degradation on "huge" databases and complex user questions that require multi-step reasoning. Moreover, most existing methods neglect the crucial significance of LLMs utilizing external tools and model collaboration. To address these challenges, we introduce MAC-SQL, a novel LLM-based multi-agent collaborative framework. Our framework comprises a core decomposer agent for Text-to-SQL generation with few-shot chain-of-thought reasoning, accompanied by two auxiliary agents that utilize external tools or models to acquire smaller sub-databases and refine erroneous SQL queries. The decomposer agent collaborates with auxiliary agents, which are activated as needed and can be expanded to accommodate new features or tools for effective Text-to-SQL parsing. In our framework, We initially leverage GPT-4 as the strong backbone LLM for all agent tasks to determine the upper bound of our framework. We then fine-tune an open-sourced instruction-followed model, SQL-Llama, by leveraging Code Llama 7B, to accomplish all tasks as GPT-4 does. Experiments show that SQL-Llama achieves a comparable execution accuracy of 43.94, compared to the baseline accuracy of 46.35 for vanilla GPT-4. At the time of writing, MAC-SQL+GPT-4 achieves an execution accuracy of 59.59 when evaluated on the BIRD benchmark, establishing a new state-of-the-art (SOTA) on its holdout test set (https://github.com/wbbeyourself/MAC-SQL).
翻译:近期基于大语言模型的文本到SQL方法通常在处理"庞大"数据库以及需要多步推理的复杂用户问题时,会出现显著的性能下降。此外,现有方法大多忽视了利用外部工具及模型协作的关键重要性。为应对这些挑战,我们提出了MAC-SQL,一种新颖的基于大语言模型的多智能体协同框架。该框架包含一个核心的分解器智能体,通过少量示例的思维链推理进行文本到SQL的生成,并辅以两个辅助智能体,它们利用外部工具或模型来获取较小的子数据库并修正错误的SQL查询。分解器智能体与辅助智能体协同工作,后者按需激活,并可扩展以集成新特性或工具,从而实现高效的文本到SQL解析。在本框架中,我们首先采用GPT-4作为所有智能体任务的强大骨干大语言模型,以确定框架的性能上限。随后,我们基于Code Llama 7B微调了一个开源指令跟随模型SQL-Llama,使其能够完成与GPT-4相同的所有任务。实验表明,SQL-Llama取得了43.94的执行准确率,与原始GPT-4的基线准确率46.35相当。在撰写本文时,MAC-SQL+GPT-4在BIRD基准测试中取得了59.59的执行准确率,在其保留测试集上创造了新的最优性能(https://github.com/wbbeyourself/MAC-SQL)。