ArmSSL: Adversarial Robust Black-Box Watermarking for Self-Supervised Learning Pre-trained Encoders

Self-supervised learning (SSL) encoders are invaluable intellectual property (IP). However, no existing SSL watermarking for IP protection can concurrently satisfy the following two practical requirements: (1) provide ownership verification capability under black-box suspect model access once the stolen encoders are used in downstream tasks; (2) be robust under adversarial watermark detection or removal, because the watermark samples form a distinguishable out-of-distribution (OOD) cluster. We propose ArmSSL, an SSL watermarking framework that assures black-box verifiability and adversarial robustness while preserving utility. For verification, we introduce paired discrepancy enlargement, enforcing feature-space orthogonality between the clean and its watermark counterpart to produce a reliable verification signal in black-box against the suspect model. For adversarial robustness, ArmSSL integrates latent representation entanglement and distribution alignment to suppress the OOD clustering. The former entangles watermark representations with clean representations (i.e., from non-source-class) to avoid forming a dense cluster of watermark samples, while the latter minimizes the distributional discrepancy between watermark and clean representations, thereby disguising watermark samples as natural in-distribution data. For utility, a reference-guided watermark tuning strategy is designed to allow the watermark to be learned as a small side task without affecting the main task by aligning the watermarked encoder's outputs with those of the original clean encoder on normal data. Extensive experiments across five mainstream SSL frameworks and nine benchmark datasets, along with end-to-end comparisons with SOTAs, demonstrate that ArmSSL achieves superior ownership verification, negligible utility degradation, and strong robustness against various adversarial detection and removal.

翻译：自监督学习（SSL）编码器是宝贵的知识产权（IP）。然而，现有用于IP保护的SSL水印方法无法同时满足以下两个实际需求：（1）当被盗编码器用于下游任务时，能够在黑盒可疑模型访问下提供所有权验证能力；（2）在对抗性水印检测或移除下具有稳健性，因为水印样本会形成可区分的分布外（OOD）聚类。我们提出ArmSSL，一种SSL水印框架，在保持效用的同时确保黑盒可验证性和对抗稳健性。对于验证，我们引入配对差异放大，强制干净表示与其水印对应特征之间的正交性，以在黑盒环境下针对可疑模型产生可靠的验证信号。对于对抗稳健性，ArmSSL整合了潜在表示纠缠和分布对齐以抑制OOD聚类。前者将水印表示与干净表示（即来自非源类）纠缠，避免形成水印样本的密集聚类；后者最小化水印表示与干净表示之间的分布差异，从而将水印样本伪装为自然的分布内数据。对于效用，设计了一种参考引导的水印调优策略，通过使加水印编码器在正常数据上的输出与原始干净编码器的输出对齐，允许水印作为小型辅助任务学习而不影响主任务。在五个主流SSL框架和九个基准数据集上的大量实验，以及与当前最优方法的端到端比较表明，ArmSSL实现了优越的所有权验证、可忽略的效用退化，以及对各种对抗性检测和移除的强大稳健性。