Ensemble everything everywhere is a defense to adversarial examples that was recently proposed to make image classifiers robust. This defense works by ensembling a model's intermediate representations at multiple noisy image resolutions, producing a single robust classification. This defense was shown to be effective against multiple state-of-the-art attacks. Perhaps even more convincingly, it was shown that the model's gradients are perceptually aligned: attacks against the model produce noise that perceptually resembles the targeted class. In this short note, we show that this defense is not robust to adversarial attack. We first show that the defense's randomness and ensembling method cause severe gradient masking. We then use standard adaptive attack techniques to reduce the defense's robust accuracy from 48% to 14% on CIFAR-100 and from 62% to 11% on CIFAR-10, under the $\ell_\infty$-norm threat model with $\varepsilon=8/255$.
翻译:"Ensemble everything everywhere"是近期提出的一种针对对抗样本的防御方法,旨在增强图像分类器的鲁棒性。该防御机制通过对多个含噪图像分辨率下的模型中间表示进行集成,生成单一的鲁棒分类结果。研究表明,该防御能有效抵抗多种最先进的攻击方法。更具说服力的是,该模型的梯度具有感知对齐特性:针对该模型的攻击所产生的噪声在感知上类似于目标类别。本短篇研究说明,该防御机制无法抵抗对抗攻击。我们首先证明该防御的随机性和集成方法会导致严重的梯度掩蔽现象。随后,我们采用标准的自适应攻击技术,在$\ell_\infty$范数威胁模型($\varepsilon=8/255$)下,将CIFAR-100数据集上的防御鲁棒准确率从48%降至14%,在CIFAR-10数据集上从62%降至11%。