Images generated by diffusion models like Stable Diffusion are increasingly widespread. Recent works and even lawsuits have shown that these models are prone to replicating their training data, unbeknownst to the user. In this paper, we first analyze this memorization problem in text-to-image diffusion models. While it is widely believed that duplicated images in the training set are responsible for content replication at inference time, we observe that the text conditioning of the model plays a similarly important role. In fact, we see in our experiments that data replication often does not happen for unconditional models, while it is common in the text-conditional case. Motivated by our findings, we then propose several techniques for reducing data replication at both training and inference time by randomizing and augmenting image captions in the training set.
翻译:由稳定扩散等扩散模型生成的图像日益普及。近期研究甚至诉讼表明,这些模型会无意中复制其训练数据,而用户对此并不知情。本文首先分析了文本到图像扩散模型中的记忆问题。尽管普遍认为训练集中的重复图像是推理时内容复制的根源,但我们观察到模型的文本条件同样扮演着重要角色。事实上,实验表明无条件模型很少发生数据复制现象,而文本条件模型的情况则较为普遍。基于这一发现,我们随后提出若干技术手段,通过随机化和增强训练集中的图像描述,在训练与推理阶段减少数据复制。