Knowledge distillation transfers large teacher models to compact student models, enabling deployment on resource-limited platforms while suffering minimal performance degradation. However, this paradigm could lead to various security risks, especially model theft. Existing defenses against model theft, such as watermarking and secure enclaves, focus primarily on identity authentication and incur significant resource costs. Aiming to provide post-theft accountability and traceability, we propose a novel fingerprinting framework that superimposes device-specific Physical Unclonable Function (PUF) signatures onto teacher logits during distillation. Compared with watermarking or secure enclaves, our approach is lightweight, requires no architectural changes, and enables traceability of any leaked or cloned model. Since the signatures are based on PUFs, this framework is robust against reverse engineering and tampering attacks. In this framework, the signature recovery process consists of two stages: first a neural network-based decoder and then a Hamming distance decoder. Furthermore, we also propose a bit compression scheme to support a large number of devices. Experiment results demonstrate that our framework achieves high key recovery rate and negligible accuracy loss while allowing a tunable trade-off between these two key metrics. These results show that the proposed framework is a practical and robust solution for protecting distilled models.
翻译:知识蒸馏将大型教师模型迁移至紧凑型学生模型,使其能够在资源受限平台上部署,同时保持性能损失最小。然而,该范式可能引发多种安全风险,尤其是模型窃取。现有的模型窃取防御方法(如数字水印和安全飞地)主要侧重于身份认证,且需消耗大量资源。为提供窃取后的问责与可追溯性,我们提出一种新型指纹识别框架,该框架在蒸馏过程中将设备特定的物理不可克隆函数(PUF)特征叠加到教师模型的逻辑输出上。相较于数字水印或安全飞地方案,本方法具有轻量化、无需修改模型架构的特点,并能对任何泄露或克隆模型实现溯源追踪。由于特征基于PUF构建,该框架对逆向工程与篡改攻击具有强鲁棒性。在此框架中,特征恢复过程包含两个阶段:首先通过基于神经网络的解码器,再经汉明距离解码器处理。此外,我们还提出一种比特压缩方案以支持大规模设备部署。实验结果表明,本框架在实现高密钥恢复率与可忽略精度损失的同时,允许对这两个关键指标进行可调节的权衡。这些结果证明,所提框架为保护蒸馏模型提供了一种实用且鲁棒的解决方案。