Deep neural networks (DNNs) could be deceived by generating human-imperceptible perturbations of clean samples. Therefore, enhancing the robustness of DNNs against adversarial attacks is a crucial task. In this paper, we aim to train robust DNNs by limiting the set of outputs reachable via a norm-bounded perturbation added to a clean sample. We refer to this set as adversarial polytope, and each clean sample has a respective adversarial polytope. Indeed, if the respective polytopes for all the samples are compact such that they do not intersect the decision boundaries of the DNN, then the DNN is robust against adversarial samples. Hence, the inner-working of our algorithm is based on learning \textbf{c}onfined \textbf{a}dversarial \textbf{p}olytopes (CAP). By conducting a thorough set of experiments, we demonstrate the effectiveness of CAP over existing adversarial robustness methods in improving the robustness of models against state-of-the-art attacks including AutoAttack.
翻译:深度神经网络(DNN)可能因在干净样本上添加人眼难以察觉的扰动而受到欺骗。因此,增强DNN对对抗攻击的鲁棒性是一项关键任务。本文旨在通过限制在干净样本上添加范数有界扰动后所能达到的输出集合,来训练鲁棒的DNN。我们将此集合称为对抗多面体,每个干净样本都对应一个相应的对抗多面体。实际上,如果所有样本的相应多面体都足够紧凑,以至于不与DNN的决策边界相交,那么该DNN就对对抗样本具有鲁棒性。因此,我们算法的内部机制基于学习受限对抗多面体(CAP)。通过进行一系列全面的实验,我们证明了CAP在提升模型对包括AutoAttack在内的最先进攻击的鲁棒性方面,优于现有的对抗鲁棒性方法。