Query Efficient Decision Based Sparse Attacks Against Black-Box Deep Learning Models

from arxiv, Published as a conference paper at the International Conference on Learning Representations (ICLR 2022). Code is available at https://sparseevoattack.github.io/

Despite our best efforts, deep learning models remain highly vulnerable to even tiny adversarial perturbations applied to the inputs. The ability to extract information from solely the output of a machine learning model to craft adversarial perturbations to black-box models is a practical threat against real-world systems, such as autonomous cars or machine learning models exposed as a service (MLaaS). Of particular interest are sparse attacks. The realization of sparse attacks in black-box models demonstrates that machine learning models are more vulnerable than we believe. Because these attacks aim to minimize the number of perturbed pixels measured by l_0 norm-required to mislead a model by solely observing the decision (the predicted label) returned to a model query; the so-called decision-based attack setting. But, such an attack leads to an NP-hard optimization problem. We develop an evolution-based algorithm-SparseEvo-for the problem and evaluate against both convolutional deep neural networks and vision transformers. Notably, vision transformers are yet to be investigated under a decision-based attack setting. SparseEvo requires significantly fewer model queries than the state-of-the-art sparse attack Pointwise for both untargeted and targeted attacks. The attack algorithm, although conceptually simple, is also competitive with only a limited query budget against the state-of-the-art gradient-based whitebox attacks in standard computer vision tasks such as ImageNet. Importantly, the query efficient SparseEvo, along with decision-based attacks, in general, raise new questions regarding the safety of deployed systems and poses new directions to study and understand the robustness of machine learning models.

翻译：尽管我们尽了最大努力，深度学习模型对输入中微小的对抗扰动仍然高度脆弱。仅通过机器学习模型输出提取信息来构造针对黑盒模型的对抗扰动，是对现实系统（如自动驾驶汽车或作为服务提供的机器学习模型MLaaS）的实际威胁。稀疏攻击尤其值得关注。在黑盒模型中实现稀疏攻击表明，机器学习模型的脆弱性超出了我们的想象。此类攻击旨在仅通过观察模型查询返回的决策（预测标签）来最小化被扰动像素的数量（以ℓ₀范数度量），从而误导模型——即所谓的基于决策的攻击设置。然而，这种攻击导致了一个NP难优化问题。我们提出了一种基于进化的算法——SparseEvo——来解决该问题，并在卷积深度神经网络和视觉Transformer上进行了评估。值得注意的是，视觉Transformer尚未在基于决策的攻击设置下被研究。在无目标和有目标攻击中，SparseEvo所需的模型查询次数远少于目前最先进的稀疏攻击Pointwise。尽管概念上简单，该攻击算法在标准计算机视觉任务（如ImageNet）中，在有限查询预算下也能与最先进的基于梯度的白盒攻击相抗衡。重要的是，查询高效的SparseEvo与基于决策的攻击共同提出了关于部署系统安全性的新问题，并为研究和理解机器学习模型的鲁棒性指明了新方向。