Data-free knowledge distillation (KD) helps transfer knowledge from a pre-trained model (known as the teacher model) to a smaller model (known as the student model) without access to the original training data used for training the teacher model. However, the security of the synthetic or out-of-distribution (OOD) data required in data-free KD is largely unknown and under-explored. In this work, we make the first effort to uncover the security risk of data-free KD w.r.t. untrusted pre-trained models. We then propose Anti-Backdoor Data-Free KD (ABD), the first plug-in defensive method for data-free KD methods to mitigate the chance of potential backdoors being transferred. We empirically evaluate the effectiveness of our proposed ABD in diminishing transferred backdoor knowledge while maintaining compatible downstream performances as the vanilla KD. We envision this work as a milestone for alarming and mitigating the potential backdoors in data-free KD. Codes are released at https://github.com/illidanlab/ABD.
翻译:无数据知识蒸馏(KD)可在无需访问原始训练数据的情况下,将预训练模型(称为教师模型)的知识迁移至较小模型(称为学生模型)。然而,无数据知识蒸馏中所采用的合成数据或分布外(OOD)数据的安全性在很大程度上尚不明确且缺乏探索。本研究首次揭示了无数据知识蒸馏在面临不可信预训练模型时的安全风险。为此,我们提出了反后门无数据知识蒸馏(ABD)——首个针对无数据知识蒸馏方法的即插即用型防御方法,旨在降低潜在后门被迁移的概率。通过实验验证,所提出的ABD方法能够有效削弱被迁移的后门知识,同时保持与原始无数据知识蒸馏相当的下游任务性能。我们将本工作视为警示并缓解无数据知识蒸馏中潜在后门问题的重要里程碑。相关代码已发布于https://github.com/illidanlab/ABD。