Requirements Coverage-Guided Minimization for Natural Language Test Cases

As software systems evolve, test suites tend to grow in size and often contain redundant test cases. Such redundancy increases testing effort, time, and cost. Test suite minimization (TSM) aims to eliminate such redundancy while preserving key properties such as requirement coverage and fault detection capability. In this paper, we propose RTM (Requirement coverage-guided Test suite Minimization), a novel TSM approach designed for requirement-based testing (validation), which can effectively reduce test suite redundancy while ensuring full requirement coverage and a high fault detection rate (FDR) under a fixed minimization budget. Based on common practice in critical systems where functional safety is important, we assume test cases are specified in natural language and traced to requirements before being implemented. RTM preprocesses test cases using three different preprocessing methods, and then converts them into vector representations using seven text embedding techniques. Similarity values between vectors are computed utilizing three distance functions. A Genetic Algorithm, whose population is initialized by coverage-preserving initialization strategies, is then employed to identify an optimized subset containing diverse test cases matching the set budget. We evaluate RTM on an industrial automotive system dataset comprising $736$ system test cases and $54$ requirements. Experimental results show that RTM consistently outperforms baseline techniques in terms of FDR across different minimization budgets while maintaining full requirement coverage. Furthermore, we investigate the impact of test suite redundancy levels on the effectiveness of TSM, providing new insights into optimizing requirement-based test suites under practical constraints.

翻译：随着软件系统的演进，测试套件规模趋于增长且常包含冗余测试用例。此类冗余会增加测试工作量、时间与成本。测试套件最小化旨在消除冗余，同时保持需求覆盖率和缺陷检测能力等关键属性。本文提出RTM（需求覆盖引导的测试套件最小化），这是一种专为基于需求的测试（验证）设计的新型TSM方法，能在固定最小化预算下有效降低测试套件冗余，同时确保完全的需求覆盖率和较高的缺陷检测率。基于功能安全至关重要的关键系统常见实践，我们假设测试用例以自然语言指定，并在实施前追溯至需求。RTM采用三种不同的预处理方法对测试用例进行预处理，随后使用七种文本嵌入技术将其转换为向量表示。利用三种距离函数计算向量间的相似度值。接着采用遗传算法（其种群通过覆盖保持初始化策略进行初始化）来识别符合设定预算的、包含多样化测试用例的优化子集。我们在包含$736$个系统测试用例与$54$项需求的工业汽车系统数据集上评估RTM。实验结果表明，在不同最小化预算下，RTM在保持完全需求覆盖率的同时，其缺陷检测率始终优于基线技术。此外，我们研究了测试套件冗余水平对TSM有效性的影响，为实际约束下优化基于需求的测试套件提供了新见解。