Influence Functions for Scalable Data Attribution in Diffusion Models

Diffusion models have led to significant advancements in generative modelling. Yet their widespread adoption poses challenges regarding data attribution and interpretability. In this paper, we aim to help address such challenges in diffusion models by developing an influence functions framework. Influence function-based data attribution methods approximate how a model's output would have changed if some training data were removed. In supervised learning, this is usually used for predicting how the loss on a particular example would change. For diffusion models, we focus on predicting the change in the probability of generating a particular example via several proxy measurements. We show how to formulate influence functions for such quantities and how previously proposed methods can be interpreted as particular design choices in our framework. To ensure scalability of the Hessian computations in influence functions, we systematically develop K-FAC approximations based on generalised Gauss-Newton matrices specifically tailored to diffusion models. We recast previously proposed methods as specific design choices in our framework and show that our recommended method outperforms previous data attribution approaches on common evaluations, such as the Linear Data-modelling Score (LDS) or retraining without top influences, without the need for method-specific hyperparameter tuning.

翻译：扩散模型在生成建模领域取得了显著进展。然而，其广泛采用带来了数据归因与可解释性方面的挑战。本文旨在通过构建影响函数框架，帮助应对扩散模型中的此类挑战。基于影响函数的数据归因方法近似估计移除部分训练数据时模型输出的变化。在监督学习中，该方法通常用于预测特定样本的损失如何变化。对于扩散模型，我们重点通过若干代理度量来预测生成特定样本的概率变化。我们展示了如何为此类量值构建影响函数，并说明先前提出的方法如何可解释为我们框架中的特定设计选择。为确保影响函数中Hessian矩阵计算的可扩展性，我们基于专门为扩散模型定制的广义高斯-牛顿矩阵，系统性地开发了K-FAC近似方法。我们将先前提出的方法重新阐释为我们框架中的特定设计选择，并证明在常见评估指标（如线性数据建模评分LDS或移除高影响力样本的重训练）上，我们推荐的方法优于以往的数据归因方法，且无需针对特定方法进行超参数调优。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日