Deep neural networks (DNNs) could be deceived by generating human-imperceptible perturbations of clean samples. Therefore, enhancing the robustness of DNNs against adversarial attacks is a crucial task. In this paper, we aim to train robust DNNs by limiting the set of outputs reachable via a norm-bounded perturbation added to a clean sample. We refer to this set as adversarial polytope, and each clean sample has a respective adversarial polytope. Indeed, if the respective polytopes for all the samples are compact such that they do not intersect the decision boundaries of the DNN, then the DNN is robust against adversarial samples. Hence, the inner-working of our algorithm is based on learning \textbf{c}onfined \textbf{a}dversarial \textbf{p}olytopes (CAP). By conducting a thorough set of experiments, we demonstrate the effectiveness of CAP over existing adversarial robustness methods in improving the robustness of models against state-of-the-art attacks including AutoAttack.
翻译:深度神经网络(DNN)可能因人类无法察觉的干净样本扰动而受欺骗。因此,增强DNN对抗攻击的鲁棒性是一项关键任务。本文旨在通过限制在干净样本上添加范数有界扰动后可达的输出集来训练鲁棒DNN。我们将该集合称为对抗多面体,每个干净样本都有一个相应的对抗多面体。实际上,若所有样本的对应多面体足够紧凑,以至于不与DNN的决策边界相交,则DNN对对抗样本具有鲁棒性。因此,我们算法的核心基于学习受限对抗多面体(CAP)。通过一系列详尽的实验,我们证明了CAP在提升模型对包括AutoAttack在内的最先进攻击鲁棒性方面优于现有对抗鲁棒性方法。