This paper describes a simple yet effective technique for refining a pretrained classifier network. The proposed AdCorDA method is based on modification of the training set and making use of the duality between network weights and layer inputs. We call this input space training. The method consists of two stages - adversarial correction followed by domain adaptation. Adversarial correction uses adversarial attacks to correct incorrect training-set classifications. The incorrectly classified samples of the training set are removed and replaced with the adversarially corrected samples to form a new training set, and then, in the second stage, domain adaptation is performed back to the original training set. Extensive experimental validations show significant accuracy boosts of over 5% on the CIFAR-100 dataset. The technique can be straightforwardly applied to refinement of weight-quantized neural networks, where experiments show substantial enhancement in performance over the baseline. The adversarial correction technique also results in enhanced robustness to adversarial attacks.
翻译:本文提出了一种简单而有效的技术,用于精炼预训练分类器网络。所提出的AdCorDA方法基于训练集的修改,并利用网络权重与层输入之间的对偶性,我们将其称为输入空间训练。该方法包含两个阶段:对抗性修正与领域自适应。对抗性修正利用对抗性攻击来纠正训练集中错误分类的样本;移除训练集中错误分类的样本,并用对抗性修正后的样本替换,从而形成新的训练集。随后,在第二阶段执行从新训练集向原始训练集的领域自适应。大量实验验证表明,该方法在CIFAR-100数据集上实现了超过5%的显著准确率提升。该技术可直接应用于权重量化神经网络的精炼,实验显示其性能较基准方法有显著提升。此外,对抗性修正技术还增强了模型对对抗性攻击的鲁棒性。