Current studies on adversarial robustness mainly focus on aggregating local robustness results from a set of data samples to evaluate and rank different models. However, the local statistics may not well represent the true global robustness of the underlying unknown data distribution. To address this challenge, this paper makes the first attempt to present a new framework, called GREAT Score , for global robustness evaluation of adversarial perturbation using generative models. Formally, GREAT Score carries the physical meaning of a global statistic capturing a mean certified attack-proof perturbation level over all samples drawn from a generative model. For finite-sample evaluation, we also derive a probabilistic guarantee on the sample complexity and the difference between the sample mean and the true mean. GREAT Score has several advantages: (1) Robustness evaluations using GREAT Score are efficient and scalable to large models, by sparing the need of running adversarial attacks. In particular, we show high correlation and significantly reduced computation cost of GREAT Score when compared to the attack-based model ranking on RobustBench (Croce,et. al. 2021). (2) The use of generative models facilitates the approximation of the unknown data distribution. In our ablation study with different generative adversarial networks (GANs), we observe consistency between global robustness evaluation and the quality of GANs. (3) GREAT Score can be used for remote auditing of privacy-sensitive black-box models, as demonstrated by our robustness evaluation on several online facial recognition services.
翻译:当前对抗鲁棒性研究主要集中于聚合一组数据样本的局部鲁棒性结果,以评估和排序不同模型。然而,局部统计量可能无法充分表征未知数据分布的真实全局鲁棒性。为应对这一挑战,本文首次提出一种名为GREAT Score的新框架,用于基于生成模型评估对抗扰动的全局鲁棒性。形式上,GREAT Score承载了全局统计量的物理意义,该统计量捕捉了从生成模型中采样的所有样本的平均认证抗攻击扰动水平。针对有限样本评估,我们还推导了样本复杂度及样本均值与真实均值之间差异的概率保证。GREAT Score具有以下优势:(1)使用GREAT Score进行鲁棒性评估无需执行对抗攻击,高效且可扩展至大规模模型。具体而言,我们展示了GREAT Score与基于攻击的模型排序(RobustBench,Croce等,2021)具有高度相关性,同时计算成本显著降低。(2)利用生成模型有助于近似未知数据分布。在使用不同生成对抗网络(GANs)的消融研究中,我们观察到全局鲁棒性评估与GAN质量之间存在一致性。(3)GREAT Score可用于隐私敏感黑箱模型的远程审计,如我们在多个在线人脸识别服务上的鲁棒性评估所示。