Generative models, especially text-to-image diffusion models, have significantly advanced in their ability to generate images, benefiting from enhanced architectures, increased computational power, and large-scale datasets. While the datasets play an important role, their protection has remained as an unsolved issue. Current protection strategies, such as watermarks and membership inference, are either in high poison rate which is detrimental to image quality or suffer from low accuracy and robustness. In this work, we introduce a novel approach, EnTruth, which Enhances Traceability of unauthorized dataset usage utilizing template memorization. By strategically incorporating the template memorization, EnTruth can trigger the specific behavior in unauthorized models as the evidence of infringement. Our method is the first to investigate the positive application of memorization and use it for copyright protection, which turns a curse into a blessing and offers a pioneering perspective for unauthorized usage detection in generative models. Comprehensive experiments are provided to demonstrate its effectiveness in terms of data-alteration rate, accuracy, robustness and generation quality.
翻译:生成模型,尤其是文本到图像扩散模型,得益于架构的改进、计算能力的提升以及大规模数据集的应用,其图像生成能力已取得显著进展。尽管数据集在其中扮演着重要角色,但其保护问题仍未得到解决。当前的数据保护策略,如数字水印和成员推理,要么因高污染率而损害图像质量,要么存在准确性和鲁棒性不足的问题。本研究提出了一种新颖方法——EnTruth,该方法利用模板记忆机制来增强对未经授权数据集使用的可追溯性。通过策略性地引入模板记忆,EnTruth能够在未经授权的模型中触发特定行为,作为侵权证据。我们的方法是首次探索记忆机制的积极应用,并将其用于版权保护,从而化弊为利,为生成模型中的未经授权使用检测提供了开创性视角。我们提供了全面的实验,以证明该方法在数据修改率、准确性、鲁棒性和生成质量方面的有效性。