We introduce a metric for evaluating the robustness of a classifier, with particular attention to adversarial perturbations, in terms of expected functionality with respect to possible adversarial perturbations. A classifier is assumed to be non-functional (that is, has a functionality of zero) with respect to a perturbation bound if a conventional measure of performance, such as classification accuracy, is less than a minimally viable threshold when the classifier is tested on examples from that perturbation bound. Defining robustness in terms of an expected value is motivated by a domain general approach to robustness quantification.
翻译:我们提出一种评估分类器鲁棒性的度量指标,重点关注对抗扰动方面的表现,具体从针对可能对抗扰动的期望功能性角度进行衡量。假设当分类器在特定扰动界限的样本上测试时,若传统性能指标(如分类准确率)低于最小可行阈值,则认为该分类器在该扰动界限下"功能失效"(即功能性为零)。基于期望值定义鲁棒性的思路,源于一种面向域的鲁棒性量化通用方法。