As neural networks are increasingly deployed in safety-critical domains, testing is essential to evaluate and improve their reliability. Existing testing methods, whether black-box or white-box, primarily use global mutation or coverage-guided strategies, both of which struggle to efficiently uncover diverse model failures while remaining proximate to the original data distribution and semantics. We propose BayesWarp, a testing framework that addresses this limitation by mutating decision-critical input regions identified via interpretable saliency techniques and adaptively guiding the testing process using an uncertainty-aware Bayesian Optimization strategy, enabling the discovery of diverse failures while preserving distributional and semantic proximity to the original data. Evaluation on MNIST, CIFAR-10, and ImageNet across six neural network models shows that BayesWarp improves failure discovery, failure diversity, test case quality, and critical neuron coverage under a fixed mutation budget. These results demonstrate that BayesWarp improves testing effectiveness. Moreover, fine-tuning with the generated failure cases leads to improvements in model performance.
翻译:随着神经网络在安全关键领域的部署日益广泛,测试对于评估和提高其可靠性至关重要。现有测试方法(无论是黑盒方法还是白盒方法)主要使用全局变异或覆盖引导策略,这两种方法在保持与原始数据分布和语义接近的同时,难以有效发现多样的模型失效。我们提出BayesWarp测试框架,通过变异由可解释显著性技术识别的决策关键输入区域,并采用不确定性感知的贝叶斯优化策略自适应引导测试过程,从而在保持与原始数据分布和语义接近性的同时发现多样的失效。在MNIST、CIFAR-10和ImageNet数据集上对六个神经网络模型的评估表明,在固定变异预算下,BayesWarp提高了失效发现、失效多样性、测试用例质量和关键神经元覆盖率。这些结果表明BayesWarp提高了测试有效性。此外,使用生成的失效案例进行微调可带来模型性能的提升。