Unit testing is critical to the software development process, ensuring the correctness of basic programming units in a program (e.g., a method). Search-based software testing (SBST) is an automated approach to generating test cases. SBST generates test cases with genetic algorithms by specifying the coverage criterion (e.g., branch coverage). However, a good test suite must have different properties, which cannot be captured using an individual coverage criterion. Therefore, the state-of-the-art approach combines multiple criteria to generate test cases. Since combining multiple coverage criteria brings multiple objectives for optimization, it hurts the test suites' coverage for certain criteria compared with using the single criterion. To cope with this problem, we propose a novel approach named \textbf{smart selection}. Based on the coverage correlations among criteria and the subsumption relationships among coverage goals, smart selection selects a subset of coverage goals to reduce the number of optimization objectives and avoid missing any properties of all criteria. We conduct experiments to evaluate smart selection on $400$ Java classes with three state-of-the-art genetic algorithms under the $2$-minute budget. On average, smart selection outperforms combining all goals on $65.1\%$ of the classes having significant differences between the two approaches. Secondly, we conduct experiments to verify our assumptions about coverage criteria relationships. Furthermore, we assess the coverage performance of smart selection under varying budgets of $5$, $8$, and $10$ minutes and explore its effect on bug detection, confirming the advantage of smart selection over combining all goals.
翻译:单元测试是软件开发过程中的关键环节,用于确保程序中基本编程单元(如方法)的正确性。基于搜索的软件测试(SBST)是一种自动生成测试用例的方法。SBST通过指定覆盖准则(如分支覆盖),利用遗传算法生成测试用例。然而,一个好的测试套件需具备多种特性,单一覆盖准则无法全面捕捉这些特性。因此,现有前沿方法通过组合多个准则来生成测试用例。但由于组合多个覆盖准则会引入多目标优化问题,与使用单一准则相比,这反而可能降低测试套件在特定准则上的覆盖效果。为解决此问题,我们提出一种名为\textbf{智能选择}的新方法。基于准则间的覆盖相关性及覆盖目标间的包含关系,智能选择通过筛选部分覆盖目标来减少优化目标数量,同时避免遗漏所有准则的各类特性。我们在$400$个Java类上使用三种前沿遗传算法,在$2$分钟时间预算下开展实验评估。平均而言,在存在显著差异的$65.1\%$的类中,智能选择的表现优于组合全部目标的方法。其次,我们通过实验验证了关于覆盖准则关系的假设。此外,我们评估了智能选择在$5$、$8$、$10$分钟不同时间预算下的覆盖性能,并探究其对缺陷检测的影响,进一步证实了智能选择相较于组合全部目标方法的优势。