Large Language Models (LLMs) have emerged as a powerful tool in advancing the Text-to-SQL task, significantly outperforming traditional methods. Nevertheless, as a nascent research field, there is still no consensus on the optimal prompt templates and design frameworks. Additionally, existing benchmarks inadequately explore the performance of LLMs across the various sub-tasks of the Text-to-SQL process, which hinders the assessment of LLMs' cognitive capabilities and the optimization of LLM-based solutions. To address the aforementioned issues, we firstly construct a new dataset designed to mitigate the risk of overfitting in LLMs. Then we formulate five evaluation tasks to comprehensively assess the performance of diverse methods across various LLMs throughout the Text-to-SQL process.Our study highlights the performance disparities among LLMs and proposes optimal in-context learning solutions tailored to each task. These findings offer valuable insights for enhancing the development of LLM-based Text-to-SQL systems.
翻译:大型语言模型(LLM)在推进文本到SQL任务方面已成为一种强大工具,其表现显著优于传统方法。然而,作为新兴研究领域,目前关于最优提示模板和设计框架尚未达成共识。此外,现有基准未能充分探究LLM在文本到SQL流程各子任务中的性能,这阻碍了对LLM认知能力的评估及基于LLM的解决方案的优化。针对上述问题,我们首先构建了一个旨在降低LLM过拟合风险的新数据集。随后,我们设计了五项评估任务,以全面评估不同方法在文本到SQL流程中各LLM上的表现。我们的研究揭示了LLM之间的性能差异,并针对每项任务提出了最优的上下文学习解决方案。这些发现为改进基于LLM的文本到SQL系统开发提供了宝贵见解。