Training data attribution (TDA) should enable generative model interpretability and foster a variety of related downstream tasks. Nonetheless, current TDA approaches lack reliability and robustness, preventing their adoption in real-world setups. In this paper, we take a decisive step towards more reliable and robust TDA for diffusion models. We propose to perform TDA with mirrored unlearning and noise-consistent skew (MUCS). The idea is to fine-tune a second model with bounded mirrored gradient ascent, and to measure the normalized skew of this model with respect to the original one using consistent noise samples. We show that, while being conceptually simple and generic, MUCS systematically outperforms existing methods on three different datasets by a large margin. We additionally study the effect that core design choices have on final performance, and analyze novel aspects regarding the overlap of influential instances across generated items and the potential of ensembling TDA approaches. We believe that our findings may have broader implications for more general unlearning setups, as well as for tasks requiring the comparison of diffusion losses.
翻译:训练数据归因(TDA)应能实现生成模型的可解释性,并促进多种相关下游任务的发展。然而,当前TDA方法缺乏可靠性和鲁棒性,阻碍了其在现实场景中的应用。本文在提升扩散模型TDA的可靠性与鲁棒性方面迈出了决定性的一步。我们提出通过镜像反学习与噪声一致性偏斜(MUCS)进行TDA。其核心思想是采用有界镜像梯度上升法对第二个模型进行微调,并利用一致性噪声样本测量该模型相对于原始模型的归一化偏斜度。我们证明了MUCS方法虽然概念简单且具有通用性,但在三个不同数据集上以显著优势系统性地超越了现有方法。此外,我们还研究了核心设计选择对最终性能的影响,并分析了生成项目间影响实例重叠现象以及TDA方法集成潜力等新颖维度。我们相信,这些发现可能对更广泛的反学习场景,以及需要比较扩散损失的任务具有深远启示意义。