Incorporating diffusion-generated synthetic data into adversarial training (AT) has been shown to substantially improve the training of robust image classifiers. In this work, we extend the role of diffusion models beyond merely generating synthetic data, examining whether their internal representations, which encode meaningful features of the data, can provide additional benefits for robust classifier training. Through systematic experiments, we show that diffusion models offer representations that are both diverse and partially robust, and that explicitly incorporating diffusion representations as an auxiliary learning signal during AT consistently improves robustness across settings. Furthermore, our representation analysis indicates that incorporating diffusion models into AT encourages more disentangled features, while diffusion representations and diffusion-generated synthetic data play complementary roles in shaping representations. Experiments on CIFAR-10, CIFAR-100, and ImageNet validate these findings, demonstrating the effectiveness of jointly leveraging diffusion representations and synthetic data within AT.
翻译:将扩散模型生成的合成数据纳入对抗训练已被证明能显著提升鲁棒图像分类器的训练效果。本研究拓展了扩散模型的作用,超越单纯的合成数据生成,探究其内部表征——这些表征编码了数据的有意义特征——是否能为鲁棒分类器训练提供额外增益。通过系统性实验,我们证明扩散模型提供的表征兼具多样性与部分鲁棒性,且在对抗训练期间显式地将扩散表征作为辅助学习信号,能在不同设置下持续提升鲁棒性。此外,我们的表征分析表明,将扩散模型融入对抗训练有助于形成更解耦的特征,同时扩散表征与扩散生成的合成数据在塑造表征方面发挥互补作用。在CIFAR-10、CIFAR-100和ImageNet数据集上的实验验证了这些发现,证明了在对抗训练中联合利用扩散表征与合成数据的有效性。