The task of SQL query equivalence checking is important in various real-world applications (including query rewriting and automated grading) that involve complex queries with integrity constraints; yet, state-of-the-art techniques are very limited in their capability of reasoning about complex features (e.g., those that involve sorting, case statement, rich integrity constraints, etc.) in real-life queries. To the best of our knowledge, we propose the first SMT-based approach and its implementation, VeriEQL, capable of proving and disproving bounded equivalence of complex SQL queries. VeriEQL is based on a new logical encoding that models query semantics over symbolic tuples using the theory of integers with uninterpreted functions. It is simple yet highly practical -- our comprehensive evaluation on over 20,000 benchmarks shows that VeriEQL outperforms all state-of-the-art techniques by more than one order of magnitude in terms of the number of benchmarks that can be proved or disproved. VeriEQL can also generate counterexamples that facilitate many downstream tasks (such as finding serious bugs in systems like MySQL and Apache Calcite).
翻译:SQL查询等价性检查任务在涉及具有完整性约束的复杂查询的各种实际应用(包括查询重写和自动评分)中至关重要;然而,现有技术在推理实际查询中的复杂特性(例如涉及排序、case语句、丰富完整性约束等)方面能力非常有限。据我们所知,我们提出了首个基于SMT的方法及其实现VeriEQL,能够证明和证伪复杂SQL查询的有界等价性。VeriEQL基于一种新的逻辑编码,该编码通过带有未解释函数的整数理论对符号元组上的查询语义进行建模。该方法简单但极具实用性——我们对超过20,000个基准的全面评估显示,在可证明或证伪的基准数量上,VeriEQL将所有现有技术的性能提升了一个数量级以上。VeriEQL还能生成反例,从而促进许多下游任务(例如在MySQL和Apache Calcite等系统中发现严重错误)。