Generation of sample data for testing SQL queries has been an important task for many years, with applications such as testing of SQL queries used for data analytics and in application software, as well as student SQL queries. More recently, with the increasing use of text-to-SQL systems, test data is key for the validation of generated queries. Earlier work for test data generation handled basic single block SQL queries, as well as simple nested SQL queries, but could not handle more complex queries. In this paper, we present a novel data generation approach that is designed to handle complex queries, and show its effectiveness on queries for which the earlier XData approach is not as effective. We also show that it can outperform the state-of-the-art VeriEQL system in showing non-equivalence of queries.
翻译:多年来,生成用于测试SQL查询的样本数据一直是一项重要任务,其应用场景包括数据分析中使用的SQL查询测试、应用软件中的SQL查询测试以及学生SQL查询测试。近年来,随着文本到SQL系统的日益普及,测试数据对于生成查询的验证至关重要。早期的测试数据生成工作能够处理基本的单块SQL查询以及简单的嵌套SQL查询,但无法处理更复杂的查询。本文提出了一种新颖的数据生成方法,专门设计用于处理复杂查询,并在早期XData方法效果欠佳的查询案例中展示了本方法的有效性。我们还证明,在展示查询非等价性方面,本方法能够超越当前最先进的VeriEQL系统。