Successful deployment of Deep Neural Networks (DNNs), particularly in safety-critical systems, requires their validation with an adequate test set to ensure a sufficient degree of confidence in test outcomes. Mutation analysis, one of the main techniques for measuring test adequacy in traditional software, has been adapted to DNNs in recent years. This technique is based on generating mutants that aim to be representative of actual faults and thus can be used for test adequacy assessment. In this paper, we investigate for the first time whether mutation operators that directly modify the trained DNN model (i.e., post-training) can be used for reliably assessing the test inputs of DNNs. We propose and evaluate TEASMA, an approach based on post-training mutation for assessing the adequacy of DNN's test sets. In practice, TEASMA allows engineers to decide whether they will be able to trust test results and thus validate the DNN before its deployment. Based on a DNN model's training set, TEASMA provides a methodology to build accurate prediction models of the Fault Detection Rate (FDR) of a test set from its mutation score, thus enabling its assessment. Our large empirical evaluation, across multiple DNN models, shows that predicted FDR values have a strong linear correlation (R2 >= 0.94) with actual values. Consequently, empirical evidence suggests that TEASMA provides a reliable basis for confidently deciding whether to trust test results or improve the test set.
翻译:深度神经网络(DNN)在安全关键系统中的成功部署,需要利用充分的测试集对其进行验证,以确保测试结果具有足够高的可信度。变异分析作为传统软件中衡量测试充分性的主要技术之一,近年来已被适配至DNN领域。该技术通过生成旨在代表真实故障的变异体,可用于测试充分性评估。本文首次研究了直接修改训练后DNN模型(即后训练阶段)的变异算子能否可靠地评估DNN测试输入。我们提出并评估了TEASMA方法——一种基于后训练变异的DNN测试集充分性评估方案。在实践中,TEASMA使工程师能够判断测试结果是否可信,进而在DNN部署前完成验证。基于DNN模型的训练集,TEASMA提供了一种方法,可从测试集的变异得分构建其故障检测率(FDR)的精确预测模型,从而支持测试集评估。我们在多个DNN模型上开展的大规模实证研究表明,预测的FDR值与实际值呈现强线性相关性(R²≥0.94)。因此,实验证据表明TEASMA为可信决策(信赖测试结果或改进测试集)提供了可靠基础。