Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs

Jinyang Li,Binyuan Hui,Ge Qu,Binhua Li,Jiaxi Yang,Bowen Li,Bailin Wang,Bowen Qin,Rongyu Cao,Ruiying Geng,Nan Huo,Xuanhe Zhou,Chenhao Ma,Guoliang Li,Kevin C. C. Chang,Fei Huang,Reynold Cheng,Yongbin Li

Text-to-SQL parsing, which aims at converting natural language instructions into executable SQLs, has gained increasing attention in recent years. In particular, Codex and ChatGPT have shown impressive results in this task. However, most of the prevalent benchmarks, i.e., Spider, and WikiSQL, focus on database schema with few rows of database contents leaving the gap between academic study and real-world applications. To mitigate this gap, we present Bird, a big benchmark for large-scale database grounded in text-to-SQL tasks, containing 12,751 pairs of text-to-SQL data and 95 databases with a total size of 33.4 GB, spanning 37 professional domains. Our emphasis on database values highlights the new challenges of dirty database contents, external knowledge between NL questions and database contents, and SQL efficiency, particularly in the context of massive databases. To solve these problems, text-to-SQL models must feature database value comprehension in addition to semantic parsing. The experimental results demonstrate the significance of database values in generating accurate text-to-SQLs for big databases. Furthermore, even the most effective text-to-SQL models, i.e. ChatGPT, only achieves 40.08% in execution accuracy, which is still far from the human result of 92.96%, proving that challenges still stand. Besides, we also provide an efficiency analysis to offer insights into generating text-to-efficient-SQLs that are beneficial to industries. We believe that BIRD will contribute to advancing real-world applications of text-to-SQL research. The leaderboard and source code are available: https://bird-bench.github.io/.

翻译：文本转SQL解析旨在将自然语言指令转换为可执行的SQL查询语句，近年来受到广泛关注。特别是Codex和ChatGPT在该任务中展现了令人瞩目的成果。然而，现有主流基准（如Spider、WikiSQL）主要聚焦于包含少量数据库内容的模式，导致学术研究与实际应用之间存在差距。为弥合这一差距，我们提出Bird——面向文本转SQL任务的大规模数据库基准，包含12751对文本转SQL数据和95个数据库（总容量达33.4GB），覆盖37个专业领域。我们重点突出数据库值所带来的新挑战：脏数据内容、自然语言问题与数据库内容间的外部知识关联，以及大规模数据库背景下的SQL效率问题。为解决这些问题，文本转SQL模型除语义解析能力外，还需具备数据库值理解能力。实验结果表明，对于大规模数据库而言，数据库值在生成准确文本转SQL结果中具有关键作用。此外，即使当前最先进的文本转SQL模型ChatGPT的执行准确率也仅为40.08%，远低于人类表现的92.96%，证明该任务仍存在显著挑战。我们同时提供了效率分析，为生成对产业界有益的高效SQL查询语句提供洞见。我们相信Bird将推动文本转SQL研究在实际应用中的发展。排行榜与源代码已公开：https://bird-bench.github.io/。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/