AdjointDPM: Adjoint Sensitivity Method for Gradient Backpropagation of Diffusion Probabilistic Models

Existing customization methods require access to multiple reference examples to align pre-trained diffusion probabilistic models (DPMs) with user-provided concepts. This paper aims to address the challenge of DPM customization when the only available supervision is a differentiable metric defined on the generated contents. Since the sampling procedure of DPMs involves recursive calls to the denoising UNet, na\"ive gradient backpropagation requires storing the intermediate states of all iterations, resulting in extremely high memory consumption. To overcome this issue, we propose a novel method AdjointDPM, which first generates new samples from diffusion models by solving the corresponding probability-flow ODEs. It then uses the adjoint sensitivity method to backpropagate the gradients of the loss to the models' parameters (including conditioning signals, network weights, and initial noises) by solving another augmented ODE. To reduce numerical errors in both the forward generation and gradient backpropagation processes, we further reparameterize the probability-flow ODE and augmented ODE as simple non-stiff ODEs using exponential integration. Finally, we demonstrate the effectiveness of AdjointDPM on three interesting tasks: converting visual effects into identification text embeddings, finetuning DPMs for specific types of stylization, and optimizing initial noise to generate adversarial samples for security auditing.

翻译：现有定制化方法需要访问多个参考样本，才能将预训练的扩散概率模型与用户提供的概念对齐。本文旨在解决当唯一可用的监督信号是定义在生成内容上的可微度量时，扩散概率模型的定制化挑战。由于扩散概率模型的采样过程涉及对去噪UNet的递归调用，朴素梯度反向传播需要存储所有迭代的中间状态，导致极高的内存消耗。为解决这一问题，我们提出了一种新方法AdjointDPM，首先通过求解对应的概率流常微分方程从扩散模型生成新样本，然后利用伴随灵敏度方法，通过求解另一个增广常微分方程将损失梯度反向传播至模型参数（包括条件信号、网络权重和初始噪声）。为减少正向生成和梯度反向传播过程中的数值误差，我们进一步利用指数积分将概率流常微分方程和增广常微分方程重参数化为简单的非刚性常微分方程。最后，我们在三个有趣的场景中验证了AdjointDPM的有效性：将视觉效果转换为识别文本嵌入、针对特定风格化类型微调扩散概率模型、以及优化初始噪声以生成用于安全审计的对抗样本。

相关内容

反向传播

关注 354

反向传播一词严格来说仅指用于计算梯度的算法，而不是指如何使用梯度。但是该术语通常被宽松地指整个学习算法，包括如何使用梯度，例如通过随机梯度下降。反向传播将增量计算概括为增量规则中的增量规则，该规则是反向传播的单层版本，然后通过自动微分进行广义化，其中反向传播是反向累积（或“反向模式”）的特例。在机器学习中，反向传播（backprop）是一种广泛用于训练前馈神经网络以进行监督学习的算法。对于其他人工神经网络（ANN）都存在反向传播的一般化–一类算法，通常称为“反向传播”。反向传播算法的工作原理是，通过链规则计算损失函数相对于每个权重的梯度，一次计算一层，从最后一层开始向后迭代，以避免链规则中中间项的冗余计算。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日