In this work, we propose StegGuard, a novel fingerprinting mechanism to verify the ownership of the suspect pre-trained encoder using steganography. A critical perspective in StegGuard is that the unique characteristic of the transformation from an image to an embedding, conducted by the pre-trained encoder, can be equivalently exposed how an embeder embeds secrets into images and how an extractor extracts the secrets from encoder's embeddings with a tolerable error after the secrets are subjected to the encoder's transformation. While each independent encoder has a distinct transformation, the piracy encoder has a similar transformation to the victim. Based on these, we learn a pair of secrets embeder and extractor as the fingerprint for the victim encoder. We introduce a frequency-domain channel attention embedding block into the embeder to adaptively embed secrets into suitable frequency bands. During verification, if the secrets embedded into the query images can be extracted with an acceptable error from the suspect encoder's embeddings, the suspect encoder is determined as piracy, otherwise independent. Extensive experiments demonstrate that depending on a very limited number of query images, StegGuard can reliably identify across varied independent encoders, and is robust against model stealing related attacks including model extraction, fine-tuning, pruning, embedding noising and shuffle.
翻译:本文提出StegGuard——一种基于隐写术的新型指纹识别机制,用于验证可疑预训练编码器的所有权。StegGuard的核心观点在于:预训练编码器将图像转换为嵌入表示这一过程的独特性,可等价体现为秘密嵌入器将秘密信息嵌入图像的方式,以及秘密提取器从编码器嵌入中提取秘密信息时(在秘密信息经历编码器变换后)的误差可容忍性。每个独立编码器具有不同的变换特性,而盗版编码器与受害编码器却存在相似变换。基于此,我们为受害编码器学习一对秘密嵌入器与提取器作为其指纹。在嵌入器中引入频域通道注意力嵌入模块,以自适应地将秘密信息嵌入到合适的频带中。验证阶段,若从可疑编码器的嵌入中能以可接受误差提取出嵌入查询图像的秘密信息,则该编码器被判定为盗版,否则为独立编码器。大量实验表明,仅依赖极少量查询图像,StegGuard即可可靠识别各类独立编码器,并有效抵御模型窃取相关攻击(包括模型提取、微调、剪枝、嵌入噪声及嵌入重排)。