While large text-to-image models are able to synthesize "novel" images, these images are necessarily a reflection of the training data. The problem of data attribution in such models -- which of the images in the training set are most responsible for the appearance of a given generated image -- is a difficult yet important one. As an initial step toward this problem, we evaluate attribution through "customization" methods, which tune an existing large-scale model toward a given exemplar object or style. Our key insight is that this allows us to efficiently create synthetic images that are computationally influenced by the exemplar by construction. With our new dataset of such exemplar-influenced images, we are able to evaluate various data attribution algorithms and different possible feature spaces. Furthermore, by training on our dataset, we can tune standard models, such as DINO, CLIP, and ViT, toward the attribution problem. Even though the procedure is tuned towards small exemplar sets, we show generalization to larger sets. Finally, by taking into account the inherent uncertainty of the problem, we can assign soft attribution scores over a set of training images.
翻译:尽管大型文本到图像模型能够生成“新颖”的图像,但这些图像必然是对训练数据的反映。此类模型中数据归因的问题——即训练集中哪些图像对某生成图像的外观影响最大——虽困难却至关重要。作为解决该问题的初步步骤,我们通过“定制化”方法评估归因,该方法针对给定样本对象或风格调整现有大规模模型。我们的关键见解在于,这使我们能够高效地创建受样本计算影响的合成图像。利用这一包含样本影响图像的新数据集,我们能够评估多种数据归因算法及不同可能的特征空间。此外,通过在该数据集上训练,我们可以针对归因问题调整标准模型,如DINO、CLIP和ViT。尽管训练过程针对小样本集优化,但我们展示了其向更大样本集的泛化能力。最后,通过考虑问题固有的不确定性,我们能够在训练图像集上分配软归因分数。