While it is shown in the literature that simultaneously accurate and robust classifiers exist for common datasets, previous methods that improve the adversarial robustness of classifiers often manifest an accuracy-robustness trade-off. We build upon recent advancements in data-driven ``locally biased smoothing'' to develop classifiers that treat benign and adversarial test data differently. Specifically, we tailor the smoothing operation to the usage of a robust neural network as the source of robustness. We then extend the smoothing procedure to the multi-class setting and adapt an adversarial input detector into a policy network. The policy adaptively adjusts the mixture of the robust base classifier and a standard network, where the standard network is optimized for clean accuracy and is not robust in general. We provide theoretical analyses to motivate the use of the adaptive smoothing procedure, certify the robustness of the smoothed classifier under realistic assumptions, and justify the introduction of the policy network. We use various attack methods, including AutoAttack and adaptive attack, to empirically verify that the smoothed model noticeably improves the accuracy-robustness trade-off. On the CIFAR-100 dataset, our method simultaneously achieves an 80.09\% clean accuracy and a 32.94\% AutoAttacked accuracy. The code that implements adaptive smoothing is available at https://github.com/Bai-YT/AdaptiveSmoothing.
翻译:现有文献表明,常见数据集上存在兼具准确性与鲁棒性的分类器,但现有提升分类器对抗鲁棒性的方法往往存在准确性与鲁棒性的权衡。我们基于近年来数据驱动型"局部偏置平滑"的进展,开发了能够区分良性样本与对抗样本的分类器。具体而言,我们针对鲁棒神经网络作为鲁棒性来源的特性定制了平滑操作,进而将平滑过程扩展至多分类场景,并将对抗输入检测器改造为策略网络。该策略通过自适应调整鲁棒基分类器与标准网络的混合比例(标准网络针对纯净准确率优化且通常不具备鲁棒性),实现了动态平衡。我们通过理论分析论证了自适应平滑方法的合理性,在现实假设下证明了平滑分类器的鲁棒性,并验证了引入策略网络的必要性。我们采用包括AutoAttack与自适应攻击在内的多种攻击方法进行实验验证,结果表明平滑模型显著改善了准确性与鲁棒性的权衡。在CIFAR-100数据集上,我们的方法同时实现了80.09%的纯净准确率与32.94%的AutoAttack攻击准确率。自适应平滑的实现代码已开源至https://github.com/Bai-YT/AdaptiveSmoothing。