Adversarial training methods commonly generate independent initial perturbation for adversarial samples from a simple uniform distribution, and obtain the training batch for the classifier without selection. In this work, we propose a simple yet effective training framework called Batch-in-Batch (BB) to enhance models robustness. It involves specifically a joint construction of initial values that could simultaneously generates $m$ sets of perturbations from the original batch set to provide more diversity for adversarial samples; and also includes various sample selection strategies that enable the trained models to have smoother losses and avoid overconfident outputs. Through extensive experiments on three benchmark datasets (CIFAR-10, SVHN, CIFAR-100) with two networks (PreActResNet18 and WideResNet28-10) that are used in both the single-step (Noise-Fast Gradient Sign Method, N-FGSM) and multi-step (Projected Gradient Descent, PGD-10) adversarial training, we show that models trained within the BB framework consistently have higher adversarial accuracy across various adversarial settings, notably achieving over a 13% improvement on the SVHN dataset with an attack radius of 8/255 compared to the N-FGSM baseline model. Furthermore, experimental analysis of the efficiency of both the proposed initial perturbation method and sample selection strategies validates our insights. Finally, we show that our framework is cost-effective in terms of computational resources, even with a relatively large value of $m$.
翻译:对抗训练方法通常从简单的均匀分布中独立生成对抗样本的初始扰动,并在不进行选择的情况下获取分类器的训练批次。本文提出了一种简单而有效的训练框架——批内批(Batch-in-Batch, BB),以增强模型的鲁棒性。该框架具体包括初始值的联合构造,能够从原始批次集中同时生成 $m$ 组扰动,为对抗样本提供更高的多样性;同时引入多种样本选择策略,使训练后的模型损失更平滑,并避免过度自信的输出。通过在三个基准数据集(CIFAR-10、SVHN、CIFAR-100)上使用两种网络(PreActResNet18和WideResNet28-10)进行单步(噪声快速梯度符号法,N-FGSM)和多步(投影梯度下降,PGD-10)对抗训练的广泛实验,我们表明在BB框架下训练的模型在各种对抗设置中始终具有更高的对抗准确率,尤其是在攻击半径为8/255时,在SVHN数据集上相比N-FGSM基线模型实现了超过13%的提升。此外,对所提出的初始扰动方法和样本选择策略的效率进行实验分析,验证了我们的观点。最后,我们证明该框架在计算资源方面具有成本效益,即使 $m$ 值相对较大时也是如此。