Generation-based testing techniques have shown their effectiveness in detecting logic bugs of DBMS, which are often caused by improper implementation of query optimizers. Nonetheless, existing generation-based debug tools are limited to single-table queries and there is a substantial research gap regarding multi-table queries with join operators. In this paper, we propose TQS, a novel testing framework targeted at detecting logic bugs derived by queries involving multi-table joins. Given a target DBMS, TQS achieves the goal with two key components: Data-guided Schema and Query Generation (DSG) and Knowledge-guided Query Space Exploration (KQE). DSG addresses the key challenge of multi-table query debugging: how to generate ground-truth (query, result) pairs for verification. It adopts the database normalization technique to generate a testing schema and maintains a bitmap index for result tracking. To improve debug efficiency, DSG also artificially inserts some noises into the generated data. To avoid repetitive query space search, KQE forms the problem as isomorphic graph set discovery and combines the graph embedding and weighted random walk for query generation. We evaluated TQS on four popular DBMSs: MySQL, MariaDB, TiDB and the gray release of an industry-leading cloud-native database, anonymized as X-DB. Experimental results show that TQS is effective in finding logic bugs of join optimization in database management systems. It successfully detected 115 bugs within 24 hours, including 31 bugs in MySQL, 30 in MariaDB, 31 in TiDB, and 23 in X-DB respectively.
翻译:基于生成的测试技术在检测数据库管理系统(DBMS)逻辑错误方面已展现出其有效性,这些错误通常由查询优化器的不当实现引起。然而,现有的基于生成的调试工具仅限于单表查询,对于包含连接操作符的多表查询存在显著的研究空白。本文提出TQS,一种旨在检测源自涉及多表连接查询的逻辑错误的新型测试框架。给定目标DBMS,TQS通过两个关键组件实现目标:数据引导的模式与查询生成(DSG)以及知识引导的查询空间探索(KQE)。DSG解决了多表查询调试的关键挑战:如何生成用于验证的基准真值(查询,结果)对。它采用数据库规范化技术生成测试模式,并维护位图索引以进行结果追踪。为提高调试效率,DSG还在生成数据中人为插入一些噪声。为避免重复的查询空间搜索,KQE将问题形式化为同构图集合发现,并结合图嵌入与加权随机游走进行查询生成。我们在四个流行的DBMS上评估了TQS:MySQL、MariaDB、TiDB以及一个行业领先的云原生数据库的灰度发布版本(匿名化为X-DB)。实验结果表明,TQS在发现数据库管理系统中连接优化逻辑错误方面是有效的。它在24小时内成功检测出115个错误,其中包括MySQL中的31个错误、MariaDB中的30个错误、TiDB中的31个错误以及X-DB中的23个错误。