The strategy of ensemble has become popular in adversarial defense, which trains multiple base classifiers to defend against adversarial attacks in a cooperative manner. Despite the empirical success, theoretical explanations on why an ensemble of adversarially trained classifiers is more robust than single ones remain unclear. To fill in this gap, we develop a new error theory dedicated to understanding ensemble adversarial defense, demonstrating a provable 0-1 loss reduction on challenging sample sets in an adversarial defense scenario. Guided by this theory, we propose an effective approach to improve ensemble adversarial defense, named interactive global adversarial training (iGAT). The proposal includes (1) a probabilistic distributing rule that selectively allocates to different base classifiers adversarial examples that are globally challenging to the ensemble, and (2) a regularization term to rescue the severest weaknesses of the base classifiers. Being tested over various existing ensemble adversarial defense techniques, iGAT is capable of boosting their performance by increases up to 17% evaluated using CIFAR10 and CIFAR100 datasets under both white-box and black-box attacks.
翻译:集成策略在对抗防御中已变得流行,即通过协作方式训练多个基分类器以抵御对抗攻击。尽管经验上取得了成功,但关于为何经过对抗训练的集成分类器比单个分类器更鲁棒的理论解释仍不明确。为填补这一空白,我们开发了一种新的误差理论,专门用于理解集成对抗防御,证明了在对抗防御场景中,对于具有挑战性的样本集,0-1损失可得到可证明的减少。在该理论的指导下,我们提出了一种有效改进集成对抗防御的方法,称为交互式全局对抗训练(iGAT)。该方法包括:(1)一种概率分配规则,选择性地将全局对集成具有挑战性的对抗样本分配给不同的基分类器;(2)一个正则化项,用于挽救基分类器的最严重弱点。经过在各种现有集成对抗防御技术上的测试,iGAT能够提升其性能,在CIFAR10和CIFAR100数据集上,面对白盒和黑盒攻击时,性能提升高达17%。