Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs

Jinyang Li,Binyuan Hui,Ge Qu,Binhua Li,Jiaxi Yang,Bowen Li,Bailin Wang,Bowen Qin,Rongyu Cao,Ruiying Geng,Nan Huo,Chenhao Ma,Kevin C. C. Chang,Fei Huang,Reynold Cheng,Yongbin Li

Text-to-SQL parsing, which aims at converting natural language instructions into executable SQLs, has gained increasing attention in recent years. In particular, Codex and ChatGPT have shown impressive results in this task. However, most of the prevalent benchmarks, i.e., Spider, and WikiSQL, focus on database schema with few rows of database contents leaving the gap between academic study and real-world applications. To mitigate this gap, we present Bird, a big benchmark for large-scale database grounded in text-to-SQL tasks, containing 12,751 pairs of text-to-SQL data and 95 databases with a total size of 33.4 GB, spanning 37 professional domains. Our emphasis on database values highlights the new challenges of dirty database contents, external knowledge between NL questions and database contents, and SQL efficiency, particularly in the context of massive databases. To solve these problems, text-to-SQL models must feature database value comprehension in addition to semantic parsing. The experimental results demonstrate the significance of database values in generating accurate text-to-SQLs for big databases. Furthermore, even the most effective text-to-SQL models, i.e. ChatGPT, only achieves 40.08% in execution accuracy, which is still far from the human result of 92.96%, proving that challenges still stand. Besides, we also provide an efficiency analysis to offer insights into generating text-to-efficient-SQLs that are beneficial to industries. We believe that BIRD will contribute to advancing real-world applications of text-to-SQL research. The leaderboard and source code are available: https://bird-bench.github.io/.

翻译：文本转SQL解析旨在将自然语言指令转换为可执行SQL语句，近年来受到越来越多的关注。特别是Codex和ChatGPT在该任务中展现出令人瞩目的成果。然而，现有主流基准测试（如Spider和WikiSQL）主要聚焦于包含少量数据库内容的数据库模式，导致学术研究与实际应用之间存在差距。为弥合这一差距，我们提出Bird——一个面向文本转SQL任务的大规模数据库大基准测试，包含12,751对文本转SQL数据及95个数据库，总容量达33.4 GB，覆盖37个专业领域。我们对数据库值的强调凸显了新挑战：脏数据内容、自然语言问题与数据库内容间的外部知识关联，以及大规模数据库场景下的SQL效率问题。要解决这些问题，文本转SQL模型除语义解析外还需具备数据库值理解能力。实验结果表明，数据库值在大规模数据库生成准确文本转SQL过程中具有关键意义。此外，即使最先进的文本转SQL模型（如ChatGPT）执行准确率仅为40.08%，远不及人类结果（92.96%），证明该领域仍面临严峻挑战。同时，我们还提供了效率分析，为生成有利于工业界的文本转高效SQL提供洞见。我们相信BIRD将推动文本转SQL研究在实际场景中的应用。排行榜与源代码可通过https://bird-bench.github.io/获取。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日