Recent LLM-based Text-to-SQL methods usually suffer from significant performance degradation on ``huge" databases and complex user questions that require multi-step reasoning. Moreover, most existing methods neglect the crucial significance of LLMs utilizing external tools and model collaboration. To address these challenges, we introduce MAC-SQL, a novel LLM-based multi-agent collaborative framework. Our framework comprises a core decomposer agent for Text-to-SQL generation with few-shot chain-of-thought reasoning, accompanied by two auxiliary agents that utilize external tools or models to acquire smaller sub-databases and refine erroneous SQL queries. The decomposer agent collaborates with auxiliary agents, which are activated as needed and can be expanded to accommodate new features or tools for effective Text-to-SQL parsing. In our framework, We initially leverage GPT-4 as the strong backbone LLM for all agent tasks to determine the upper bound of our framework. We then fine-tune an open-sourced instruction-followed model, SQL-Llama, by leveraging Code Llama 7B, to accomplish all tasks as GPT-4 does. Experiments show that SQL-Llama achieves a comparable execution accuracy of 43.94, compared to the baseline accuracy of 46.35 for vanilla GPT-4. At the time of writing, MAC-SQL+GPT-4 achieves an execution accuracy of 59.59 when evaluated on the BIRD benchmark, establishing a new state-of-the-art (SOTA) on its holdout test set (https://github.com/wbbeyourself/MAC-SQL).
翻译:近期基于大语言模型的文本到SQL方法在处理"大型"数据库及需要多步推理的复杂用户问题时,常出现显著的性能衰减。此外,多数现有方法忽视了LLM利用外部工具与模型协同的关键价值。为应对这些挑战,我们提出MAC-SQL——一种创新的基于LLM的多智能体协同框架。该框架包含核心分解器智能体,负责通过少样本思维链推理完成文本到SQL生成,并配备两个辅助智能体:一个利用外部工具或模型获取更小子数据库,另一个修正错误的SQL查询。分解器智能体与按需激活的辅助智能体协同运作,且可扩展以支持新功能或工具,实现高效的文本到SQL解析。在本框架中,我们首先采用GPT-4作为所有智能体任务的强基座LLM,以确定框架性能上限;随后基于Code Llama 7B微调开源指令跟随模型SQL-Llama,使其能像GPT-4一样完成所有任务。实验表明,SQL-Llama的执行准确率达43.94,与原生GPT-4的基线准确率46.35相当。本文撰写时,MAC-SQL+GPT-4在BIRD基准上评估的执行准确率达59.59,在其保留测试集上创下新最佳表现(https://github.com/wbbeyourself/MAC-SQL)。