Translating users' natural language questions into SQL queries (i.e., NL2SQL) significantly lowers the barriers to accessing relational databases. The emergence of Large Language Models has introduced a novel paradigm in NL2SQL tasks, enhancing capabilities dramatically. However, this raises a critical question: Are we fully prepared to deploy NL2SQL models in production? To address the posed questions, we present a multi-angle NL2SQL evaluation framework, NL2SQL360, to facilitate the design and test of new NL2SQL methods for researchers. Through NL2SQL360, we conduct a detailed comparison of leading NL2SQL methods across a range of application scenarios, such as different data domains and SQL characteristics, offering valuable insights for selecting the most appropriate NL2SQL methods for specific needs. Moreover, we explore the NL2SQL design space, leveraging NL2SQL360 to automate the identification of an optimal NL2SQL solution tailored to user-specific needs. Specifically, NL2SQL360 identifies an effective NL2SQL method, SuperSQL, distinguished under the Spdier dataset using the execution accuracy metric. Remarkably, SuperSQL achieves competitive performance with execution accuracy of 87% and 62.66% on the Spider and BIRD test sets, respectively.
翻译:将用户的自然语言问题转化为SQL查询(即NL2SQL)显著降低了访问关系型数据库的门槛。大型语言模型的出现为NL2SQL任务引入了全新范式,极大提升了处理能力。然而,这引发了一个关键问题:我们是否已为在生产环境中部署NL2SQL模型做好了充分准备?为解答这一问题,我们提出了一个多角度NL2SQL评估框架NL2SQL360,以助力研究者设计与测试新的NL2SQL方法。通过NL2SQL360,我们对主流NL2SQL方法在不同应用场景(如多样数据领域与SQL特征)中进行了细致比较,为针对特定需求选择最适宜的NL2SQL方法提供了宝贵洞见。此外,我们探索了NL2SQL的设计空间,利用NL2SQL360自动识别适应用户特定需求的最优NL2SQL解决方案。具体而言,NL2SQL360在Spider数据集上通过执行准确率指标识别出高效的NL2SQL方法SuperSQL。值得注意的是,SuperSQL在Spider和BIRD测试集上分别实现了87%与62.66%的执行准确率,展现出卓越的性能表现。