Text-to-SQL parsing, which aims at converting natural language instructions into executable SQLs, has gained increasing attention in recent years. In particular, Codex and ChatGPT have shown impressive results in this task. However, most of the prevalent benchmarks, i.e., Spider, and WikiSQL, focus on database schema with few rows of database contents leaving the gap between academic study and real-world applications. To mitigate this gap, we present Bird, a big benchmark for large-scale database grounded in text-to-SQL tasks, containing 12,751 pairs of text-to-SQL data and 95 databases with a total size of 33.4 GB, spanning 37 professional domains. Our emphasis on database values highlights the new challenges of dirty database contents, external knowledge between NL questions and database contents, and SQL efficiency, particularly in the context of massive databases. To solve these problems, text-to-SQL models must feature database value comprehension in addition to semantic parsing. The experimental results demonstrate the significance of database values in generating accurate text-to-SQLs for big databases. Furthermore, even the most effective text-to-SQL models, i.e. ChatGPT, only achieves 40.08% in execution accuracy, which is still far from the human result of 92.96%, proving that challenges still stand. Besides, we also provide an efficiency analysis to offer insights into generating text-to-efficient-SQLs that are beneficial to industries. We believe that BIRD will contribute to advancing real-world applications of text-to-SQL research. The leaderboard and source code are available: https://bird-bench.github.io/.
翻译:文本转SQL解析旨在将自然语言指令转换为可执行的SQL查询语句,近年来受到广泛关注。特别是Codex和ChatGPT在该任务中展现了令人瞩目的成果。然而,现有主流基准(如Spider、WikiSQL)主要聚焦于包含少量数据库内容的模式,导致学术研究与实际应用之间存在差距。为弥合这一差距,我们提出Bird——面向文本转SQL任务的大规模数据库基准,包含12751对文本转SQL数据和95个数据库(总容量达33.4GB),覆盖37个专业领域。我们重点突出数据库值所带来的新挑战:脏数据内容、自然语言问题与数据库内容间的外部知识关联,以及大规模数据库背景下的SQL效率问题。为解决这些问题,文本转SQL模型除语义解析能力外,还需具备数据库值理解能力。实验结果表明,对于大规模数据库而言,数据库值在生成准确文本转SQL结果中具有关键作用。此外,即使当前最先进的文本转SQL模型ChatGPT的执行准确率也仅为40.08%,远低于人类表现的92.96%,证明该任务仍存在显著挑战。我们同时提供了效率分析,为生成对产业界有益的高效SQL查询语句提供洞见。我们相信Bird将推动文本转SQL研究在实际应用中的发展。排行榜与源代码已公开:https://bird-bench.github.io/。