The increasing volume of data in relational databases and the expertise needed for writing SQL queries pose challenges for users to access and analyze data. Text-to-SQL (Text2SQL) solves the issues by utilizing natural language processing (NLP) techniques to convert natural language into SQL queries. With the development of Large Language Models (LLMs), a range of LLM-based Text2SQL methods have emerged. This survey provides a comprehensive review of LLMs in Text2SQL tasks. We review benchmark datasets, prompt engineering methods, fine-tuning methods, and base models in LLM-based Text2SQL methods. We provide insights in each part and discuss future directions in this field.
翻译:随着关系数据库中数据量的不断增长以及编写SQL查询所需的专业知识,用户访问和分析数据面临挑战。文本到SQL(Text2SQL)通过利用自然语言处理(NLP)技术将自然语言转换为SQL查询来解决这些问题。随着大型语言模型(LLMs)的发展,一系列基于LLM的Text2SQL方法应运而生。本综述全面回顾了LLMs在Text2SQL任务中的应用。我们回顾了基于LLM的Text2SQL方法中的基准数据集、提示工程方法、微调方法和基础模型。我们对每个部分提供了见解,并讨论了该领域的未来方向。