The Multi-Criteria Test Suite Minimization (MCTSM) problem aims to remove redundant test cases, guided by adequacy criteria such as code coverage or fault detection capability. However, current techniques either exhibit a high loss of fault detection ability or face scalability challenges due to the NP-hard nature of the problem, which limits their practical utility. We propose TripRL, a novel technique that integrates traditional criteria such as statement coverage and fault detection ability with test coverage similarity into an Integer Linear Program (ILP), to produce a diverse reduced test suite with high test effectiveness. TripRL leverages bipartite graph representation and its embedding for concise ILP formulation and combines ILP with effective reinforcement learning (RL) training. This combination renders large-scale test suite minimization more scalable and enhances test effectiveness. Our empirical evaluations demonstrate that TripRL's runtime scales linearly with the magnitude of the MCTSM problem. Notably, for large test suites from the Defects4j dataset where existing approaches fail to provide solutions within a reasonable time frame, our technique consistently delivers solutions in less than 47 minutes. The reduced test suites produced by TripRL also maintain the original statement coverage and fault detection ability while having a higher potential to detect unknown faults.
翻译:多准则测试套件最小化(MCTSM)问题旨在依据代码覆盖率或缺陷检测能力等充分性准则,去除冗余测试用例。然而,现有技术或因问题本身的NP难性质而面临可扩展性挑战,或因导致缺陷检测能力显著下降而实用性受限。我们提出TripRL,一种创新技术,它将语句覆盖率、缺陷检测能力等传统准则与测试覆盖相似度整合至整数线性规划(ILP)中,以生成兼具高测试效能与多样性的精简测试套件。TripRL利用二分图表示及其嵌入实现简洁的ILP建模,并将ILP与高效的强化学习(RL)训练相结合。这种结合使大规模测试套件最小化更具可扩展性,并提升了测试效能。我们的实证评估表明,TripRL的运行时间与MCTSM问题规模呈线性增长关系。值得注意的是,对于Defects4j数据集中现有方法无法在合理时间内给出解决方案的大型测试套件,我们的技术始终能在47分钟内提供解。TripRL生成的精简测试套件在保持原始语句覆盖率与缺陷检测能力的同时,还具有更高的潜在未知缺陷检测能力。