We propose a conceptually simple and lightweight framework for improving the robustness of vision models through the combination of knowledge distillation and data augmentation. We address the conjecture that larger models do not make for better teachers by showing strong gains in out-of-distribution robustness when distilling from pretrained foundation models. Following this finding, we propose Discrete Adversarial Distillation (DAD), which leverages a robust teacher to generate adversarial examples and a VQGAN to discretize them, creating more informative samples than standard data augmentation techniques. We provide a theoretical framework for the use of a robust teacher in the knowledge distillation with data augmentation setting and demonstrate strong gains in out-of-distribution robustness and clean accuracy across different student architectures. Notably, our method adds minor computational overhead compared to similar techniques and can be easily combined with other data augmentations for further improvements.
翻译:我们提出了一种概念简单且轻量级的框架,通过知识蒸馏与数据增强的结合来提升视觉模型的鲁棒性。针对“更大规模的模型未必是更好的教师”这一猜想,我们通过从预训练基础模型中进行蒸馏,论证了在分布外鲁棒性上的显著提升。基于这一发现,我们提出了离散对抗蒸馏(DAD)方法,该方法利用鲁棒教师模型生成对抗样本,并通过VQGAN对其进行离散化,从而创建比标准数据增强技术更具信息量的样本。我们为在知识蒸馏结合数据增强场景中使用鲁棒教师模型提供了理论框架,并证明了该方法在不同学生架构上对分布外鲁棒性与干净准确率的显著提升。值得注意的是,与类似技术相比,我们的方法仅增加少量计算开销,且可轻松与其他数据增强方法结合以实现进一步改进。