Understanding and Reusing Test Suites Across Database Systems

Database Management System (DBMS) developers have implemented extensive test suites to test their DBMSs. For example, the SQLite test suites contain over 92 million lines of code. Despite these extensive efforts, test suites are not systematically reused across DBMSs, leading to wasted effort. Integration is challenging, as test suites use various test case formats and rely on unstandardized test runner features. We present a unified test suite, SQuaLity, in which we integrated test cases from three widely-used DBMSs, SQLite, PostgreSQL, and DuckDB. In addition, we present an empirical study to determine the potential of reusing these systems' test suites. Our results indicate that reusing test suites is challenging: First, test formats and test runner commands vary widely; for example, SQLite has 4 test runner commands, while MySQL has 112 commands with additional features, to, for example, execute file operations or interact with a shell. Second, while some test suites contain mostly standard-compliant statements (e.g., 99% in SQLite), other test suites mostly test non-standardized functionality (e.g., 31% of statements in the PostgreSQL test suite are nonstandardized). Third, test reuse is complicated by various explicit and implicit dependencies, such as the need to set variables and configurations, certain test cases requiring extensions not present by default, and query results depending on specific clients. Despite the above findings, we have identified 3 crashes, 3 hangs, and multiple compatibility issues across four different DBMSs by executing test suites across DBMSs, indicating the benefits of reuse. Overall, this work represents the first step towards test-case reuse in the context of DBMSs, and we hope that it will inspire follow-up work on this important topic.

翻译：数据库管理系统（DBMS）开发者已构建了大规模的测试套件以验证其系统功能。例如，SQLite测试套件包含超过9200万行代码。尽管投入巨大，这些测试套件并未在跨DBMS场景中得到系统性复用，导致资源浪费。集成工作面临多重挑战，包括测试用例格式的多样性以及对非标准化测试运行器功能的依赖。本文提出统一测试套件SQuaLity，其中整合了SQLite、PostgreSQL和DuckDB三个主流DBMS的测试用例。此外，我们通过实证研究评估了复用这些系统测试套件的潜力。研究结果表明测试套件复用存在显著困难：首先，测试格式与测试运行器命令差异巨大（例如SQLite仅有4条测试运行命令，而MySQL包含112条支持文件操作或交互式shell等扩展功能的命令）；其次，部分测试套件主要包含符合标准规范的语句（如SQLite中占比99%），而其他测试套件则侧重测试非标准化功能（如PostgreSQL测试套件中31%的语句为非标准语句）；第三，测试复用受多种显性与隐性依赖的制约，包括变量与配置设置需求、特定测试用例依赖非默认扩展模块，以及查询结果对特定客户端的依赖性。尽管存在上述障碍，我们通过跨DBMS执行测试套件，在四个不同DBMS中发现了3个崩溃漏洞、3个挂起问题及若干兼容性问题，印证了测试复用的价值。总体而言，本研究迈出了DBMS领域测试用例复用的第一步，期望能推动这一重要课题的后续研究。