Existing customization methods require access to multiple reference examples to align pre-trained diffusion probabilistic models (DPMs) with user-provided concepts. This paper aims to address the challenge of DPM customization when the only available supervision is a differentiable metric defined on the generated contents. Since the sampling procedure of DPMs involves recursive calls to the denoising UNet, na\"ive gradient backpropagation requires storing the intermediate states of all iterations, resulting in extremely high memory consumption. To overcome this issue, we propose a novel method AdjointDPM, which first generates new samples from diffusion models by solving the corresponding probability-flow ODEs. It then uses the adjoint sensitivity method to backpropagate the gradients of the loss to the models' parameters (including conditioning signals, network weights, and initial noises) by solving another augmented ODE. To reduce numerical errors in both the forward generation and gradient backpropagation processes, we further reparameterize the probability-flow ODE and augmented ODE as simple non-stiff ODEs using exponential integration. Finally, we demonstrate the effectiveness of AdjointDPM on three interesting tasks: converting visual effects into identification text embeddings, finetuning DPMs for specific types of stylization, and optimizing initial noise to generate adversarial samples for security auditing.
翻译:现有定制方法需要访问多个参考示例,以使预训练的扩散概率模型与用户提供的概念对齐。本文旨在解决监督信号仅为生成内容上的可微度量时的扩散模型定制挑战。由于扩散模型的采样过程涉及对去噪UNet的递归调用,朴素梯度反向传播需要存储所有迭代的中间状态,导致极高的内存消耗。为解决此问题,我们提出新方法AdjointDPM,该方法首先通过求解相应的概率流常微分方程从扩散模型中生成新样本,然后利用伴随灵敏度方法通过求解另一个增广常微分方程,将损失梯度反向传播至模型参数(包括条件信号、网络权重和初始噪声)。为减少前向生成和梯度反向传播过程的数值误差,我们进一步使用指数积分将概率流常微分方程和增广常微分方程重参数化为简单的非刚性常微分方程。最后,我们在三个有趣任务上验证了AdjointDPM的有效性:将视觉效果转换为识别文本嵌入、微调扩散模型以实现特定风格化类型、以及优化初始噪声生成用于安全审计的对抗样本。