Deep neural networks (DNNs) are widely used today, but they are vulnerable to adversarial attacks. To develop effective methods of defense, it is important to understand the potential weak spots of DNNs. Often attacks are organized taking into account the architecture of models (white-box approach) and based on gradient methods, but for real-world DNNs this approach in most cases is impossible. At the same time, several gradient-free optimization algorithms are used to attack black-box models. However, classical methods are often ineffective in the multidimensional case. To organize black-box attacks for computer vision models, in this work, we propose the use of an optimizer based on the low-rank tensor train (TT) format, which has gained popularity in various practical multidimensional applications in recent years. Combined with the attribution of the target image, which is built by the auxiliary (white-box) model, the TT-based optimization method makes it possible to organize an effective black-box attack by small perturbation of pixels in the target image. The superiority of the proposed approach over three popular baselines is demonstrated for five modern DNNs on the ImageNet dataset.
翻译:深度神经网络(DNN)如今被广泛应用,但它们容易遭受对抗攻击。为了开发有效的防御方法,理解DNN的潜在薄弱环节至关重要。攻击通常根据模型架构(白盒方法)并基于梯度方法进行组织,但对于现实世界中的DNN,这种方法在多数情况下不可行。同时,有若干无梯度优化算法被用于攻击黑盒模型。然而,传统方法在多维情况下往往效果不佳。为了实现计算机视觉模型的黑盒攻击,本文提出使用基于低秩张量列车(TT)格式的优化器,该格式近年来在各种实际多维应用中广受欢迎。结合由辅助(白盒)模型构建的目标图像属性化方法,基于TT的优化方法能够通过对目标图像像素进行微小扰动组织起有效的黑盒攻击。在ImageNet数据集上针对五种现代DNN的测试表明,所提方法在性能上优于三种主流基线算法。