Deep neural networks are known to be vulnerable to small adversarial perturbations in test data. To defend against adversarial attacks, probabilistic classifiers have been proposed as an alternative to deterministic ones. However, literature has conflicting findings on the effectiveness of probabilistic classifiers in comparison to deterministic ones. In this paper, we clarify the role of randomization in building adversarially robust classifiers. Given a base hypothesis set of deterministic classifiers, we show the conditions under which a randomized ensemble outperforms the hypothesis set in adversarial risk, extending previous results. Additionally, we show that for any probabilistic binary classifier (including randomized ensembles), there exists a deterministic classifier that outperforms it. Finally, we give an explicit description of the deterministic hypothesis set that contains such a deterministic classifier for many types of commonly used probabilistic classifiers, i.e. randomized ensembles and parametric/input noise injection.
翻译:深度神经网络已知容易受到测试数据中小幅度对抗性扰动的攻击。为了抵御对抗性攻击,概率分类器被提出作为确定性分类器的替代方案。然而,文献中关于概率分类器与确定性分类器相比有效性的研究结果存在矛盾。本文中,我们阐明了随机化在构建对抗鲁棒分类器中的作用。给定一个基础确定性分类器假设集,我们展示了随机集成在对抗风险上优于该假设集的条件,从而扩展了先前的研究结果。此外,我们证明对于任何概率二元分类器(包括随机集成),存在一个优于它的确定性分类器。最后,我们针对许多常用类型的概率分类器(即随机集成和参数/输入噪声注入),明确描述了包含此类确定性分类器的确定性假设集。