SSL-WM: A Black-Box Watermarking Approach for Encoders Pre-trained by Self-supervised Learning

Recent years have witnessed tremendous success in Self-Supervised Learning (SSL), which has been widely utilized to facilitate various downstream tasks in Computer Vision (CV) and Natural Language Processing (NLP) domains. However, attackers may steal such SSL models and commercialize them for profit, making it crucial to verify the ownership of the SSL models. Most existing ownership protection solutions (e.g., backdoor-based watermarks) are designed for supervised learning models and cannot be used directly since they require that the models' downstream tasks and target labels be known and available during watermark embedding, which is not always possible in the domain of SSL. To address such a problem, especially when downstream tasks are diverse and unknown during watermark embedding, we propose a novel black-box watermarking solution, named SSL-WM, for verifying the ownership of SSL models. SSL-WM maps watermarked inputs of the protected encoders into an invariant representation space, which causes any downstream classifier to produce expected behavior, thus allowing the detection of embedded watermarks. We evaluate SSL-WM on numerous tasks, such as CV and NLP, using different SSL models both contrastive-based and generative-based. Experimental results demonstrate that SSL-WM can effectively verify the ownership of stolen SSL models in various downstream tasks. Furthermore, SSL-WM is robust against model fine-tuning, pruning, and input preprocessing attacks. Lastly, SSL-WM can also evade detection from evaluated watermark detection approaches, demonstrating its promising application in protecting the ownership of SSL models.

翻译：近年来，自监督学习（SSL）取得了巨大成功，被广泛用于促进计算机视觉（CV）和自然语言处理（NLP）领域的各类下游任务。然而，攻击者可能窃取此类SSL模型并将其商业化以牟利，这使得验证SSL模型的所有权变得至关重要。大多数现有的所有权保护方案（例如基于后门的水印）是为监督学习模型设计的，无法直接使用，因为它们要求模型的下游任务和目标标签在水印嵌入过程中已知且可用，而这在SSL领域中往往难以实现。为解决这一问题，特别是在水印嵌入过程中下游任务多样且未知的情况下，我们提出了一种新颖的黑盒水印解决方案，名为SSL-WM，用于验证SSL模型的所有权。SSL-WM将受保护编码器的水印输入映射到一个不变表示空间，这使得任何下游分类器都能产生预期行为，从而允许检测嵌入的水印。我们使用不同的SSL模型（包括基于对比的和基于生成的），在CV和NLP等多个任务上评估了SSL-WM。实验结果表明，SSL-WM能够有效验证被盗SSL模型在各种下游任务中的所有权。此外，SSL-WM对模型微调、剪枝和输入预处理攻击具有鲁棒性。最后，SSL-WM还能规避所评估的水印检测方法的检测，展示了其在保护SSL模型所有权方面的良好应用前景。