Motivated by the success of traditional software testing, numerous diversity measures have been proposed for testing deep neural networks (DNNs). In this study, we propose a shift in perspective, advocating for the consideration of DNN testing as directed testing problems rather than diversity-based testing tasks. We note that the objective of testing DNNs is specific and well-defined: identifying inputs that lead to misclassifications. Consequently, a more precise testing approach is to prioritize inputs with a higher potential to induce misclassifications, as opposed to emphasizing inputs that enhance "diversity." We derive six directed metrics for DNN testing. Furthermore, we conduct a careful analysis of the appropriate scope for each metric, as applying metrics beyond their intended scope could significantly diminish their effectiveness. Our evaluation demonstrates that (1) diversity metrics are particularly weak indicators for identifying buggy inputs resulting from small input perturbations, and (2) our directed metrics consistently outperform diversity metrics in revealing erroneous behaviors of DNNs across all scenarios.
翻译:受传统软件测试成功经验的启发,针对深度神经网络(DNN)测试已提出了诸多多样性度量方法。本研究主张转变视角,提出应将DNN测试视为定向测试问题而非基于多样性的测试任务。我们注意到,DNN测试的目标明确且具体:识别导致错误分类的输入。因此,更精确的测试方法是优先考虑具有更大可能性引发错误分类的输入,而非强调增强"多样性"的输入。我们推导出六种用于DNN测试的定向度量指标。此外,我们仔细分析了每种指标的适用边界,因为超出预期范围应用这些指标可能会显著降低其有效性。实验表明:(1) 对于由微小输入扰动引发的缺陷输入,多样性度量的指示能力尤为薄弱;(2) 在所有场景下,我们的定向度量指标在揭示DNN错误行为方面始终优于多样性度量指标。