Many-Objective Search-Based Coverage-Guided Automatic Test Generation for Deep Neural Networks

To ensure the reliability of DNN systems and address the test generation problem for neural networks, this paper proposes a fuzzing test generation technique based on many-objective optimization algorithms. Traditional fuzz testing employs random search, leading to lower testing efficiency and tends to generate numerous invalid test cases. By utilizing many-objective optimization techniques, effective test cases can be generated. To achieve high test coverage, this paper proposes several improvement strategies. The frequency-based fuzz sampling strategy assigns priorities based on the frequency of selection of initial data, avoiding the repetitive selection of the same data and enhancing the quality of initial data better than random sampling strategies. To address the issue that global search may yield test not satisfying semantic constraints, a local search strategy based on the Monte Carlo tree search is proposed to enhance the algorithm's local search capabilities. Furthermore, we improve the diversity of the population and the algorithm's global search capability by updating SPEA2's external archive based on a decomposition-based archiving strategy. To validate the effectiveness of the proposed approach, experiments were conducted on several public datasets and various neural network models. The results reveal that, compared to random and clustering-based sampling, the frequency-based fuzz sampling strategy provides a greater improvement in coverage rate in the later stages of iterations. On complex networks like VGG16, the improved SPEA2 algorithm increased the coverage rate by about 12% across several coverage metrics, and by approximately 40% on LeNet series networks. The experimental results also indicates that the newly generated test cases not only exhibit higher coverage rates but also generate adversarial samples that reveal model errors.

翻译：为确保深度神经网络系统的可靠性并解决神经网络的测试生成问题，本文提出了一种基于多目标优化算法的模糊测试生成技术。传统模糊测试采用随机搜索策略，导致测试效率较低且易产生大量无效测试用例。通过运用多目标优化技术，能够生成有效的测试用例。为实现高测试覆盖率，本文提出了若干改进策略。基于频率的模糊采样策略根据初始数据被选择的频率分配优先级，避免了相同数据的重复选择，其初始数据质量优于随机采样策略。针对全局搜索可能产生不满足语义约束的测试用例问题，提出了基于蒙特卡洛树搜索的局部搜索策略以增强算法的局部搜索能力。此外，通过基于分解的归档策略更新SPEA2的外部存档，提升了种群多样性及算法的全局搜索能力。为验证所提方法的有效性，在多个公共数据集及不同神经网络模型上进行了实验。结果表明：相较于随机采样和基于聚类的采样，基于频率的模糊采样策略在迭代后期对覆盖率提升效果更显著。在VGG16等复杂网络上，改进的SPEA2算法在多项覆盖率指标上提升了约12%的覆盖率，在LeNet系列网络上提升约40%。实验结果还表明，新生成的测试用例不仅具有更高的覆盖率，还能生成揭示模型错误的对抗样本。