Studying the robustness of machine learning models is important to ensure consistent model behaviour across real-world settings. To this end, adversarial robustness is a standard framework, which views robustness of predictions through a binary lens: either a worst-case adversarial misclassification exists in the local region around an input, or it does not. However, this binary perspective does not account for the degrees of vulnerability, as data points with a larger number of misclassified examples in their neighborhoods are more vulnerable. In this work, we consider a complementary framework for robustness, called average-case robustness, which measures the fraction of points in a local region that provides consistent predictions. However, computing this quantity is hard, as standard Monte Carlo approaches are inefficient especially for high-dimensional inputs. In this work, we propose the first analytical estimators for average-case robustness for multi-class classifiers. We show empirically that our estimators are accurate and efficient for standard deep learning models and demonstrate their usefulness for identifying vulnerable data points, as well as quantifying robustness bias of models. Overall, our tools provide a complementary view to robustness, improving our ability to characterize model behaviour.
翻译:研究机器学习模型的鲁棒性对于确保模型在现实场景中表现一致至关重要。为此,对抗鲁棒性是一个标准框架,它通过二元视角看待预测的鲁棒性:在输入局部区域内要么存在最坏情况下的对抗性误分类,要么不存在。然而,这种二元视角未能考虑脆弱性的程度,因为邻域内误分类样本数量更多的数据点实际上更为脆弱。在本工作中,我们考虑一种互补的鲁棒性框架,称为平均案例鲁棒性,该框架通过测量局部区域内提供一致预测的数据点比例来量化鲁棒性。然而,计算该量值具有挑战性,因为标准蒙特卡洛方法效率低下,尤其对于高维输入。本研究首次提出了针对多类分类器的平均案例鲁棒性解析估计量。我们通过实验证明,这些估计量对于标准深度学习模型具有高精度与高效率,并展示了其在识别脆弱数据点及量化模型鲁棒性偏差方面的实用价值。总体而言,我们的工具提供了鲁棒性的互补视角,增强了表征模型行为的能力。