Due to the widespread application of deep neural networks~(DNNs) in safety-critical tasks, deep learning testing has drawn increasing attention. During the testing process, test cases that have been fuzzed or selected using test metrics are fed into the model to find fault-inducing test units (e.g., neurons and feature maps, activating which will almost certainly result in a model error) and report them to the DNN developer, who subsequently repair them~(e.g., retraining the model with test cases). Current test metrics, however, are primarily concerned with the neurons, which means that test cases that are discovered either by guided fuzzing or selection with these metrics focus on detecting fault-inducing neurons while failing to detect fault-inducing feature maps. In this work, we propose DeepFeature, which tests DNNs from the feature map level. When testing is conducted, DeepFeature will scrutinize every internal feature map in the model and identify vulnerabilities that can be enhanced through repairing to increase the model's overall performance. Exhaustive experiments are conducted to demonstrate that (1) DeepFeature is a strong tool for detecting the model's vulnerable feature maps; (2) DeepFeature's test case selection has a high fault detection rate and can detect more types of faults~(comparing DeepFeature to coverage-guided selection techniques, the fault detection rate is increased by 49.32\%). (3) DeepFeature's fuzzer also outperforms current fuzzing techniques and generates valuable test cases more efficiently.
翻译:由于深度神经网络在安全关键任务中的广泛应用,深度学习测试日益引起关注。在测试过程中,经过模糊测试或基于测试指标筛选的测试用例被输入模型,以发现引发故障的测试单元(例如,几乎必然导致模型错误的神经元和特征图),并将这些单元报告给深度神经网络开发者,由开发者进行修复(例如,通过测试用例重新训练模型)。然而,当前的测试指标主要关注神经元,这意味着通过引导式模糊测试或基于这些指标的选择所发现的测试用例侧重于检测引发故障的神经元,而无法检测引发故障的特征图。本文提出DeepFeature方法,从特征图层面测试深度神经网络。测试执行时,DeepFeature会详细审查模型中的每个内部特征图,识别可通过修复增强的脆弱点,从而提升模型整体性能。充分的实验证明:(1)DeepFeature是检测模型脆弱特征图的强效工具;(2)DeepFeature的测试用例选择具有高故障检测率,并能检测更多类型的故障(与覆盖引导选择技术相比,故障检测率提升49.32%);(3)DeepFeature的模糊测试器同样优于现有模糊测试技术,能更高效地生成有价值的测试用例。