We present a novel adversarial penalized self-knowledge distillation method, named adversarial learning and implicit regularization for self-knowledge distillation (AI-KD), which regularizes the training procedure by adversarial learning and implicit distillations. Our model not only distills the deterministic and progressive knowledge which are from the pre-trained and previous epoch predictive probabilities but also transfers the knowledge of the deterministic predictive distributions using adversarial learning. The motivation is that the self-knowledge distillation methods regularize the predictive probabilities with soft targets, but the exact distributions may be hard to predict. Our method deploys a discriminator to distinguish the distributions between the pre-trained and student models while the student model is trained to fool the discriminator in the trained procedure. Thus, the student model not only can learn the pre-trained model's predictive probabilities but also align the distributions between the pre-trained and student models. We demonstrate the effectiveness of the proposed method with network architectures on multiple datasets and show the proposed method achieves better performance than state-of-the-art methods.
翻译:我们提出了一种新颖的对抗惩罚式自知识蒸馏方法,命名为AI-KD(对抗学习与隐式正则化自知识蒸馏)。该方法通过对抗学习和隐式蒸馏对训练过程进行正则化。我们的模型不仅能够从预训练模型和先前epoch的预测概率中蒸馏确定性与渐进式知识,还能利用对抗学习传递确定性预测分布的知识。该方法的核心动机在于:自知识蒸馏方法虽然通过软目标对预测概率进行正则化,但精确分布往往难以预测。为此,我们部署了判别器以区分预训练模型与学生模型的分布,同时学生模型在训练过程中试图欺骗该判别器。如此,学生模型不仅能学习预训练模型的预测概率,还能对齐预训练模型与学生模型的分布。我们通过多种数据集上的网络架构验证了该方法的有效性,结果表明所提方法在性能上优于现有最优方法。