The vulnerability to adversarial perturbations is a major flaw of Deep Neural Networks (DNNs) that raises question about their reliability when in real-world scenarios. On the other hand, human perception, which DNNs are supposed to emulate, is highly robust to such perturbations, indicating that there may be certain features of the human perception that make it robust but are not represented in the current class of DNNs. One such feature is that the activity of biological neurons is correlated and the structure of this correlation tends to be rather rigid over long spans of times, even if it hampers performance and learning. We hypothesize that integrating such constraints on the activations of a DNN would improve its adversarial robustness, and, to test this hypothesis, we have developed the Self-Consistent Activation (SCA) layer, which comprises of neurons whose activations are consistent with each other, as they conform to a fixed, but learned, covariability pattern. When evaluated on image and sound recognition tasks, the models with a SCA layer achieved high accuracy, and exhibited significantly greater robustness than multi-layer perceptron models to state-of-the-art Auto-PGD adversarial attacks \textit{without being trained on adversarially perturbed data
翻译:深度神经网络(DNNs)对对抗性扰动的脆弱性是其重大缺陷,这对其在真实场景中的可靠性提出了质疑。另一方面,人类感知(DNNs理应模仿的对象)对此类扰动却高度鲁棒,表明人类感知可能存在某些使其鲁棒的特征,而这些特征在当前DNNs类别中并未得到体现。其中一项特征在于,生物神经元的活动具有相关性,且这种相关性的结构在长时间尺度上往往保持相当刚性,即便这可能会损害性能与学习能力。我们假设,将这种对神经元激活的约束整合到DNN中,将提升其对抗鲁棒性。为验证这一假设,我们开发了自洽激活(SCA)层,该层由激活值相互一致的神经元构成,因为这些神经元遵循一个固定但可学习的协变性模式。在图像及声音识别任务上的评估表明,带有SCA层的模型取得了高准确率,并且与多层感知机模型相比,在未经对抗扰动数据训练的情况下,对最先进的Auto-PGD对抗攻击展现出显著更强的鲁棒性。