Large Language Models (LLMs) have emerged as a powerful tool in advancing the Text-to-SQL task, significantly outperforming traditional methods. Nevertheless, as a nascent research field, there is still no consensus on the optimal prompt templates and design frameworks. Additionally, existing benchmarks inadequately explore the performance of LLMs across the various sub-tasks of the Text-to-SQL process, which hinders the assessment of LLMs' cognitive capabilities and the optimization of LLM-based solutions.To address the aforementioned issues, we firstly construct a new dataset designed to mitigate the risk of overfitting in LLMs. Then we formulate five evaluation tasks to comprehensively assess the performance of diverse methods across various LLMs throughout the Text-to-SQL process.Our study highlights the performance disparities among LLMs and proposes optimal in-context learning solutions tailored to each task. These findings offer valuable insights for enhancing the development of LLM-based Text-to-SQL systems.
翻译:大型语言模型(LLM)已成为推动文本到SQL任务发展的强大工具,其表现显著优于传统方法。然而,作为新兴研究领域,当前关于最优提示模板和设计框架尚未达成共识。此外,现有基准测试未能充分探索LLM在文本到SQL流程各子任务中的表现,这阻碍了对LLM认知能力的评估以及基于LLM解决方案的优化。为解决上述问题,我们首先构建了一个旨在降低LLM过拟合风险的新数据集,随后设计了五项评估任务,以全面评估不同LLM在文本到SQL全流程中各类方法的性能。本研究揭示了LLM之间的性能差异,并针对每个任务提出了最优的上下文学习方案。这些发现为改进基于LLM的文本到SQL系统提供了宝贵见解。