The goal of data attribution for text-to-image models is to identify the training images that most influence the generation of a new image. We can define "influence" by saying that, for a given output, if a model is retrained from scratch without that output's most influential images, the model should then fail to generate that output image. Unfortunately, directly searching for these influential images is computationally infeasible, since it would require repeatedly retraining from scratch. We propose a new approach that efficiently identifies highly-influential images. Specifically, we simulate unlearning the synthesized image, proposing a method to increase the training loss on the output image, without catastrophic forgetting of other, unrelated concepts. Then, we find training images that are forgotten by proxy, identifying ones with significant loss deviations after the unlearning process, and label these as influential. We evaluate our method with a computationally intensive but "gold-standard" retraining from scratch and demonstrate our method's advantages over previous methods.
翻译:文本到图像模型的数据归因旨在识别对生成新图像影响最大的训练图像。我们可将"影响"定义为:对于给定输出,若模型从头开始重新训练,并移除该输出最具影响力的图像,则模型应无法生成该输出图像。然而,直接搜索这些影响力图像在计算上不可行,因为这需要反复从头重新训练。我们提出了一种高效识别高影响力图像的新方法。具体而言,我们模拟遗忘合成图像,提出一种方法:在不导致其他无关概念灾难性遗忘的前提下,增加输出图像上的训练损失。随后,我们通过代理机制寻找被遗忘的训练图像,识别出在遗忘过程中损失偏差显著的图像,并将其标记为影响力图像。我们通过计算密集型但"黄金标准"的从头重新训练方法评估了该方法,并证明了其相较于先前方法的优势。