Backdoor watermarking has emerged as the predominant approach for protecting public datasets, enabling dataset ownership verification (DOV) through embedded triggers that induce predefined model behaviors. While existing works assume that DOV results can serve as reliable evidence for copyright infringement claims, we argue that this assumption is fundamentally flawed. In this paper, we expose critical vulnerabilities in current backdoor watermarking schemes by demonstrating that attackers can forge watermarks that are statistically indistinguishable from the original ones, thereby evading infringement allegations. Specifically, we propose a Forged Watermark Generator (FW-Gen), a lightweight variational autoencoder-based framework that generates forged watermarks preserving the statistical properties of original watermarks while exhibiting distinct visual patterns. Our attack operates under a realistic threat model where an accused attacker, upon receiving an infringement claim, extracts watermark information from the protected dataset and produces counterfeit evidence to refute the allegation. Extensive experiments across six backdoor watermarking methods, two benchmark datasets, and two model architectures demonstrate that forged watermarks achieve equivalent or superior statistical significance in hypothesis testing compared to original watermarks. These findings reveal that current DOV mechanisms are insufficient as standalone evidence for copyright disputes and call for more robust dataset protection schemes.
翻译:后门水印已成为保护公共数据集的主流方法,其通过嵌入能诱发预设模型行为的触发器来实现数据集所有权验证。现有研究通常假设所有权验证结果可作为版权侵权索赔的可靠证据,但我们认为这一假设存在根本性缺陷。本文通过证明攻击者能够伪造与原始水印在统计上无法区分的水印,从而规避侵权指控,揭示了当前后门水印方案的关键漏洞。具体而言,我们提出了伪造水印生成器——一种基于变分自编码器的轻量级框架,该框架生成的伪造水印在保持原始水印统计特性的同时,呈现独特的视觉模式。我们的攻击在现实威胁模型下实施:被指控的攻击者在收到侵权声明后,从受保护数据集中提取水印信息,并生成伪造证据以反驳指控。通过对六种后门水印方法、两个基准数据集和两种模型架构的广泛实验表明,在假设检验中,伪造水印能达到与原始水印相当或更优的统计显著性。这些发现表明,当前的所有权验证机制不足以作为版权纠纷的独立证据,亟需设计更鲁棒的数据集保护方案。