The construction of adversarial attacks for neural networks appears to be a crucial challenge for their deployment in various services. To estimate the adversarial robustness of a neural network, a fast and efficient approach is needed to construct adversarial attacks. Since the formalization of adversarial attack construction involves solving a specific optimization problem, we consider the problem of constructing an efficient and effective adversarial attack from a numerical optimization perspective. Specifically, we suggest utilizing advanced projection-free methods, known as modified Frank-Wolfe methods, to construct white-box adversarial attacks on the given input data. We perform a theoretical and numerical evaluation of these methods and compare them with standard approaches based on projection operations or geometrical intuition. Numerical experiments are performed on the MNIST and CIFAR-10 datasets, utilizing a multiclass logistic regression model, the convolutional neural networks (CNNs), and the Vision Transformer (ViT).
翻译:神经网络对抗攻击的构建对其在各类服务中的部署构成关键挑战。为评估神经网络的对抗鲁棒性,需要一种快速高效的方法来构建对抗攻击。由于对抗攻击构建的形式化涉及特定优化问题的求解,本文从数值优化的角度探讨如何构建高效且有效的对抗攻击。具体而言,我们建议采用先进的免投影方法——改进型Frank-Wolfe方法——对给定输入数据构建白盒对抗攻击。我们对此类方法进行了理论与数值评估,并将其与基于投影操作或几何直觉的标准方法进行比较。数值实验在MNIST和CIFAR-10数据集上展开,使用了多类逻辑回归模型、卷积神经网络(CNN)以及视觉Transformer(ViT)模型。