To tackle the susceptibility of deep neural networks to adversarial examples, the adversarial training has been proposed which provides a notion of security through an inner maximization problem presenting the first-order adversaries embedded within the outer minimization of the training loss. To generalize the adversarial robustness over different perturbation types, the adversarial training method has been augmented with the improved inner maximization presenting a union of multiple perturbations e.g., various $\ell_p$ norm-bounded perturbations. However, the improved inner maximization only enjoys limited flexibility in terms of the allowable perturbation types. In this work, through a gating mechanism, we assemble a set of expert networks, each one either adversarially trained to deal with a particular perturbation type or normally trained for boosting accuracy on clean data. The gating module assigns weights dynamically to each expert to achieve superior accuracy under various data types e.g., adversarial examples, adverse weather perturbations, and clean input. In order to deal with the obfuscated gradients issue, the training of the gating module is conducted together with fine-tuning of the last fully connected layers of expert networks through adversarial training approach. Using extensive experiments, we show that our Mixture of Robust Experts (MoRE) approach enables a flexible integration of a broad range of robust experts with superior performance.
翻译:为应对深度神经网络对对抗样本的易受攻击性,对抗训练方法被提出,其通过内层最大化问题(表征一阶对抗扰动)嵌入训练损失的外层最小化,从而提供一种安全性保障。为将对抗鲁棒性推广至不同扰动类型,对抗训练方法通过改进的内层最大化得以增强,该最大化问题呈现了多种扰动的并集,例如各类$\ell_p$范数有界扰动。然而,改进的内层最大化在允许的扰动类型方面仅具备有限的灵活性。本工作中,我们通过门控机制集成一组专家网络:每个专家网络或经对抗训练以处理特定扰动类型,或经正常训练以提升干净数据的准确率。门控模块动态分配权重至各专家,从而在多种数据类型(如对抗样本、恶劣天气扰动和干净输入)下实现优异的准确率。为解决梯度混淆问题,门控模块的训练与专家网络末层全连接层的微调同步进行,并采用对抗训练方法。通过大量实验,我们表明所提出的混合鲁棒专家(MoRE)方法能够灵活集成广泛范围的鲁棒专家,并取得卓越性能。