Machine learning models are critically susceptible to evasion attacks from adversarial examples. Generally, adversarial examples, modified inputs deceptively similar to the original input, are constructed under whitebox settings by adversaries with full access to the model. However, recent attacks have shown a remarkable reduction in query numbers to craft adversarial examples using blackbox attacks. Particularly, alarming is the ability to exploit the classification decision from the access interface of a trained model provided by a growing number of Machine Learning as a Service providers including Google, Microsoft, IBM and used by a plethora of applications incorporating these models. The ability of an adversary to exploit only the predicted label from a model to craft adversarial examples is distinguished as a decision-based attack. In our study, we first deep dive into recent state-of-the-art decision-based attacks in ICLR and SP to highlight the costly nature of discovering low distortion adversarial employing gradient estimation methods. We develop a robust query efficient attack capable of avoiding entrapment in a local minimum and misdirection from noisy gradients seen in gradient estimation methods. The attack method we propose, RamBoAttack, exploits the notion of Randomized Block Coordinate Descent to explore the hidden classifier manifold, targeting perturbations to manipulate only localized input features to address the issues of gradient estimation methods. Importantly, the RamBoAttack is more robust to the different sample inputs available to an adversary and the targeted class. Overall, for a given target class, RamBoAttack is demonstrated to be more robust at achieving a lower distortion within a given query budget. We curate our extensive results using the large-scale high-resolution ImageNet dataset and open-source our attack, test samples and artifacts on GitHub.
翻译:机器学习模型极易受到对抗样本的逃逸攻击。通常,对抗样本是指与原始输入具有欺骗性相似性的修改输入,在白盒设置下由完全访问模型的对手构建。然而,近期攻击表明,使用黑盒攻击构造对抗样本的查询数量显著减少。尤其令人担忧的是,攻击者能够利用越来越多的机器学习即服务提供商(包括谷歌、微软、IBM)提供的训练模型访问接口中的分类决策,而大量使用这些模型的应用程序也面临此风险。对手仅利用模型预测标签来构造对抗样本的能力被称为基于决策的攻击。在本研究中,我们首先深入剖析ICLR和SP中近期最先进的基于决策的攻击,以揭示使用梯度估计方法发现低失真对抗样本的高昂代价。我们开发了一种鲁棒的查询高效攻击,能够避免陷入局部最小值以及梯度估计方法中噪声梯度导致的误导。我们提出的攻击方法RamBoAttack利用随机块坐标下降的概念来探索隐藏分类器流形,通过针对局部输入特征的扰动操作来解决梯度估计方法的问题。重要的是,RamBoAttack对于对手可用的不同样本输入和目标类别更具鲁棒性。总体而言,在给定的查询预算内,对于特定目标类别,RamBoAttack被证明在实现更低失真方面更具鲁棒性。我们使用大规模高分辨率ImageNet数据集整理了广泛的结果,并在GitHub上开源了我们的攻击方法、测试样本及工件。