Self-supervised learning (SSL), utilizing unlabeled datasets for training powerful encoders, has achieved significant success recently. These encoders serve as feature extractors for downstream tasks, requiring substantial resources. However, the challenge of protecting the intellectual property of encoder trainers and ensuring the trustworthiness of deployed encoders remains a significant gap in SSL. Moreover, recent researches highlight threats to pre-trained encoders, such as backdoor and adversarial attacks. To address these gaps, we propose SSL-Auth, the first authentication framework designed specifically for pre-trained encoders. In particular, SSL-Auth utilizes selected key samples as watermark information and trains a verification network to reconstruct the watermark information, thereby verifying the integrity of the encoder without compromising model performance. By comparing the reconstruction results of the key samples, malicious alterations can be detected, as modified encoders won't mimic the original reconstruction. Comprehensive evaluations on various encoders and diverse downstream tasks demonstrate the effectiveness and fragility of our proposed SSL-Auth.
翻译:自监督学习利用无标注数据集训练强大的编码器,近年来取得了显著成功。这些编码器作为下游任务的特征提取器,需要大量资源。然而,保护编码器训练者的知识产权并确保部署编码器的可信度,仍是自监督学习领域亟待解决的重要问题。此外,最新研究表明预训练编码器面临后门攻击和对抗攻击等威胁。为解决这些问题,我们提出SSL-Auth——首个专为预训练编码器设计的认证框架。具体而言,SSL-Auth通过选取关键样本作为水印信息,并训练验证网络重构水印信息,从而在不影响模型性能的前提下验证编码器完整性。通过比较关键样本的重构结果,可以检测恶意篡改,因为被修改的编码器无法复现原始重构效果。在多种编码器及多样化下游任务上的综合评估证明了我们提出的SSL-Auth的有效性与脆弱性。