Detecting Dataset Abuse in Fine-Tuning Stable Diffusion Models for Text-to-Image Synthesis

Text-to-image synthesis has become highly popular for generating realistic and stylized images, often requiring fine-tuning generative models with domain-specific datasets for specialized tasks. However, these valuable datasets face risks of unauthorized usage and unapproved sharing, compromising the rights of the owners. In this paper, we address the issue of dataset abuse during the fine-tuning of Stable Diffusion models for text-to-image synthesis. We present a dataset watermarking framework designed to detect unauthorized usage and trace data leaks. The framework employs two key strategies across multiple watermarking schemes and is effective for large-scale dataset authorization. Extensive experiments demonstrate the framework's effectiveness, minimal impact on the dataset (only 2% of the data required to be modified for high detection accuracy), and ability to trace data leaks. Our results also highlight the robustness and transferability of the framework, proving its practical applicability in detecting dataset abuse.

翻译：文本到图像合成技术因其能够生成逼真且风格化的图像而广受欢迎，通常需要针对特定任务使用领域专用数据集对生成模型进行微调。然而，这些宝贵的数据集面临着未经授权使用和未经批准共享的风险，损害了数据所有者的权益。本文针对文本到图像合成中微调Stable Diffusion模型时的数据集滥用问题展开研究。我们提出了一种数据集水印框架，旨在检测未经授权的使用并追踪数据泄露。该框架在多种水印方案中采用两种关键策略，适用于大规模数据集授权。大量实验证明了该框架的有效性、对数据集影响极小（仅需修改2%的数据即可实现高检测精度）以及追踪数据泄露的能力。我们的结果还突显了该框架的鲁棒性和可迁移性，证实了其在检测数据集滥用方面的实际适用性。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日