Deep neural networks are known to be vulnerable to adversarial attacks (AA). For an image recognition task, this means that a small perturbation of the original can result in the image being misclassified. Design of such attacks as well as methods of adversarial training against them are subject of intense research. We re-cast the problem using techniques of Wasserstein distributionally robust optimization (DRO) and obtain novel contributions leveraging recent insights from DRO sensitivity analysis. We consider a set of distributional threat models. Unlike the traditional pointwise attacks, which assume a uniform bound on perturbation of each input data point, distributional threat models allow attackers to perturb inputs in a non-uniform way. We link these more general attacks with questions of out-of-sample performance and Knightian uncertainty. To evaluate the distributional robustness of neural networks, we propose a first-order AA algorithm and its multi-step version. Our attack algorithms include Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) as special cases. Furthermore, we provide a new asymptotic estimate of the adversarial accuracy against distributional threat models. The bound is fast to compute and first-order accurate, offering new insights even for the pointwise AA. It also naturally yields out-of-sample performance guarantees. We conduct numerical experiments on the CIFAR-10 dataset using DNNs on RobustBench to illustrate our theoretical results. Our code is available at https://github.com/JanObloj/W-DRO-Adversarial-Methods.
翻译:深度神经网络已知对对抗攻击(AA)脆弱。在图像识别任务中,这意味着对原始输入的微小扰动可能导致图像被错误分类。此类攻击的设计及其对抗训练方法已成为研究热点。我们利用Wasserstein分布鲁棒优化(DRO)技术重新构建该问题,并基于DRO灵敏度分析的最新见解提出创新性贡献。我们考虑一系列分布威胁模型。与假设每个输入数据点受均匀扰动界限制的传统逐点攻击不同,分布威胁模型允许攻击者以非均匀方式扰动输入。我们将这些更通用的攻击与样本外性能及奈特不确定性问题相关联。为评估神经网络的分布鲁棒性,我们提出一阶AA算法及其多步版本。我们的攻击算法将快速梯度符号法(FGSM)与投影梯度下降(PGD)作为特例包含在内。此外,我们提出对抗分布威胁模型的对抗准确率新渐近估计。该边界计算快速且一阶精确,即使对逐点AA也能提供新见解,并自然导出样本外性能保证。我们在RobustBench上使用DNN对CIFAR-10数据集进行数值实验以验证理论结果。代码发布于https://github.com/JanObloj/W-DRO-Adversarial-Methods。