Current adversarial attacks for multi-class classifiers choose the target class for a given input naively, based on the classifier's confidence levels for various target classes. We present a novel adversarial targeting method, \textit{MALT - Mesoscopic Almost Linearity Targeting}, based on medium-scale almost linearity assumptions. Our attack wins over the current state of the art AutoAttack on the standard benchmark datasets CIFAR-100 and ImageNet and for a variety of robust models. In particular, our attack is \emph{five times faster} than AutoAttack, while successfully matching all of AutoAttack's successes and attacking additional samples that were previously out of reach. We then prove formally and demonstrate empirically that our targeting method, although inspired by linear predictors, also applies to standard non-linear models.
翻译:当前针对多类分类器的对抗攻击方法通常基于分类器对各类别的置信度,为给定输入朴素地选择目标类别。我们提出了一种新颖的对抗目标选择方法——\textit{MALT(介观近似线性目标选择法)},该方法基于中尺度的近似线性假设。在标准基准数据集CIFAR-100和ImageNet上,针对多种鲁棒模型,我们的攻击方法超越了当前最先进的AutoAttack。特别值得注意的是,我们的攻击速度比AutoAttack快五倍,同时成功复现了AutoAttack的所有成功案例,并能够攻击先前无法攻破的额外样本。我们随后通过理论证明和实证验证表明,尽管我们的目标选择方法受线性预测器启发,同样适用于标准的非线性模型。