This study presents a comparative analysis of the a complex SQL benchmark, TPC-DS, with two existing text-to-SQL benchmarks, BIRD and Spider. Our findings reveal that TPC-DS queries exhibit a significantly higher level of structural complexity compared to the other two benchmarks. This underscores the need for more intricate benchmarks to simulate realistic scenarios effectively. To facilitate this comparison, we devised several measures of structural complexity and applied them across all three benchmarks. The results of this study can guide future research in the development of more sophisticated text-to-SQL benchmarks. We utilized 11 distinct Language Models (LLMs) to generate SQL queries based on the query descriptions provided by the TPC-DS benchmark. The prompt engineering process incorporated both the query description as outlined in the TPC-DS specification and the database schema of TPC-DS. Our findings indicate that the current state-of-the-art generative AI models fall short in generating accurate decision-making queries. We conducted a comparison of the generated queries with the TPC-DS gold standard queries using a series of fuzzy structure matching techniques based on query features. The results demonstrated that the accuracy of the generated queries is insufficient for practical real-world application.
翻译:本研究对复杂SQL基准TPC-DS与两个现有文本到SQL基准BIRD和Spider进行了比较分析。我们的研究结果表明,相较于另外两个基准,TPC-DS查询展现出显著更高的结构复杂度。这凸显了需要更复杂的基准来有效模拟真实场景。为促进这一比较,我们设计了若干结构复杂度度量指标,并将其应用于所有三个基准。本研究结果可为未来开发更复杂的文本到SQL基准提供指导。我们利用11种不同的大型语言模型,基于TPC-DS基准提供的查询描述生成SQL查询。提示工程过程同时整合了TPC-DS规范中概述的查询描述和TPC-DS数据库模式。研究结果表明,当前最先进的生成式AI模型在生成准确的决策支持查询方面仍存在不足。我们采用基于查询特征的模糊结构匹配技术,将生成的查询与TPC-DS标准查询进行对比。结果显示,生成查询的准确性尚不足以满足实际应用需求。