While large text-to-image models are able to synthesize "novel" images, these images are necessarily a reflection of the training data. The problem of data attribution in such models -- which of the images in the training set are most responsible for the appearance of a given generated image -- is a difficult yet important one. As an initial step toward this problem, we evaluate attribution through "customization" methods, which tune an existing large-scale model toward a given exemplar object or style. Our key insight is that this allows us to efficiently create synthetic images that are computationally influenced by the exemplar by construction. With our new dataset of such exemplar-influenced images, we are able to evaluate various data attribution algorithms and different possible feature spaces. Furthermore, by training on our dataset, we can tune standard models, such as DINO, CLIP, and ViT, toward the attribution problem. Even though the procedure is tuned towards small exemplar sets, we show generalization to larger sets. Finally, by taking into account the inherent uncertainty of the problem, we can assign soft attribution scores over a set of training images.
翻译:尽管大规模文本到图像模型能合成“新颖”图像,但这些图像本质上仍是训练数据的反映。在此类模型中进行数据归因——即确定训练集中哪些图像对特定生成图像的外观贡献最大——是一个困难但重要的问题。作为该问题的初步探索,我们通过"定制化"方法评估归因效果,该方法将现有大规模模型调优至特定示例对象或风格。关键洞察在于,这使我们能够高效创建受示例计算影响的合成图像。借助这类示例影响图像的新数据集,我们能够评估多种数据归因算法及不同特征空间。此外,通过在该数据集上训练,我们可以针对归因问题调优标准模型(如DINO、CLIP和ViT)。尽管调优过程针对小样本集设计,但我们证明了其向更大样本集的泛化能力。最终,通过考虑问题的固有不确定性,我们可在一组训练图像上分配软性归因分数。