Knowledge Distillation (KD) facilitates the transfer of discriminative capabilities from an advanced teacher model to a simpler student model, ensuring performance enhancement without compromising accuracy. It is also exploited for model stealing attacks, where adversaries use KD to mimic the functionality of a teacher model. Recent developments in this domain have been influenced by the Stingy Teacher model, which provided empirical analysis showing that sparse outputs can significantly degrade the performance of student models. Addressing the risk of intellectual property leakage, our work introduces an approach to train a teacher model that inherently protects its logits, influenced by the Nasty Teacher concept. Differing from existing methods, we incorporate sparse outputs of adversarial examples with standard training data to strengthen the teacher's defense against student distillation. Our approach carefully reduces the relative entropy between the original and adversarially perturbed outputs, allowing the model to produce adversarial logits with minimal impact on overall performance. The source codes will be made publicly available soon.
翻译:知识蒸馏(KD)促进了从高级教师模型向简单学生模型的判别能力迁移,在保证性能提升的同时不牺牲准确性。该技术也被用于模型窃取攻击,攻击者利用KD模仿教师模型的功能。该领域的最新发展受到吝啬教师模型的影响,其实证分析表明稀疏输出能显著降低学生模型的性能。针对知识产权泄露风险,本研究受恶意教师概念启发,提出了一种训练内在保护其逻辑值的教师模型的方法。与现有方法不同,我们将对抗样本的稀疏输出与标准训练数据相结合,以增强教师模型对学生蒸馏的防御能力。本方法谨慎地降低了原始输出与对抗扰动输出之间的相对熵,使模型能够在最小化整体性能影响的前提下生成对抗逻辑值。源代码将于近期公开。