In deep learning applications, robustness measures the ability of neural models that handle slight changes in input data, which could lead to potential safety hazards, especially in safety-critical applications. Pre-deployment assessment of model robustness is essential, but existing methods often suffer from either high costs or imprecise results. To enhance safety in real-world scenarios, metrics that effectively capture the model's robustness are needed. To address this issue, we compare the rigour and usage conditions of various assessment methods based on different definitions. Then, we propose a straightforward and practical metric utilizing hypothesis testing for probabilistic robustness and have integrated it into the TorchAttacks library. Through a comparative analysis of diverse robustness assessment methods, our approach contributes to a deeper understanding of model robustness in safety-critical applications.
翻译:在深度学习应用中,鲁棒性衡量了神经网络模型处理输入数据微小变化的能力,这种变化可能引发潜在安全隐患,尤其在安全关键型应用中。模型部署前的鲁棒性评估至关重要,但现有方法常面临成本高昂或结果不精确的问题。为提升真实场景中的安全性,需要能有效捕捉模型鲁棒性的度量指标。针对此问题,我们基于不同定义比较了多种评估方法的严谨性与适用条件,进而提出一种利用假设检验量化概率鲁棒性的简洁实用指标,并将其集成至TorchAttacks库中。通过对多样化鲁棒性评估方法的对比分析,我们的工作有助于在安全关键型应用中深化对模型鲁棒性的理解。