A new method of detecting adversarial attacks is proposed for an ensemble of Deep Neural Networks (DNNs) solving two-class pattern recognition problems. The ensemble is combined using Walsh coefficients which are capable of approximating Boolean functions and thereby controlling the complexity of the ensemble decision boundary. The hypothesis in this paper is that decision boundaries with high curvature allow adversarial perturbations to be found, but change the curvature of the decision boundary, which is then approximated in a different way by Walsh coefficients compared to the clean images. By observing the difference in Walsh coefficient approximation between clean and adversarial images, it is shown experimentally that transferability of attack may be used for detection. Furthermore, approximating the decision boundary may aid in understanding the learning and transferability properties of DNNs. While the experiments here use images, the proposed approach of modelling two-class ensemble decision boundaries could in principle be applied to any application area. Code for approximating Boolean functions using Walsh coefficients: https://doi.org/10.24433/CO.3695905.v1
翻译:针对解决二类模式识别问题的深度神经网络集成,本文提出了一种新的对抗攻击检测方法。该集成利用能够近似布尔函数的沃尔什系数进行组合,从而控制集成决策边界的复杂度。本文的假设是:高曲率的决策边界虽然允许发现对抗扰动,但会改变决策边界的曲率,使得沃尔什系数对对抗图像的近似方式与干净图像不同。通过观察干净图像与对抗图像在沃尔什系数近似上的差异,实验表明攻击的可迁移性可用于检测。此外,对决策边界的近似可能有助于理解深度神经网络的学习与可迁移性特性。虽然本文实验基于图像数据,但所提出的二类集成决策边界建模方法原则上可适用于任何应用领域。使用沃尔什系数近似布尔函数的代码:https://doi.org/10.24433/CO.3695905.v1