Fast constrained sampling in pre-trained diffusion models

Diffusion models have dominated the field of large, generative image models, with the prime examples of Stable Diffusion and DALL-E 3 being widely adopted. These models have been trained to perform text-conditioned generation on vast numbers of image-caption pairs and as a byproduct, have acquired general knowledge about natural image statistics. However, when confronted with the task of constrained sampling, e.g. generating the right half of an image conditioned on the known left half, applying these models is a delicate and slow process, with previously proposed algorithms relying on expensive iterative operations that are usually orders of magnitude slower than text-based inference. This is counter-intuitive, as image-conditioned generation should rely less on the difficult-to-learn semantic knowledge that links captions and imagery, and should instead be achievable by lower-level correlations among image pixels. In practice, inverse models are trained or tuned separately for each inverse problem, e.g. by providing parts of images during training as an additional condition, to allow their application in realistic settings. However, we argue that this is not necessary and propose an algorithm for fast-constrained sampling in large pre-trained diffusion models (Stable Diffusion) that requires no expensive backpropagation operations through the model and produces results comparable even to the state-of-the-art \emph{tuned} models. Our method is based on a novel optimization perspective to sampling under constraints and employs a numerical approximation to the expensive gradients, previously computed using backpropagation, incurring significant speed-ups.

翻译：扩散模型已主导大规模生成式图像模型领域，以Stable Diffusion和DALL-E 3为代表的模型被广泛采用。这些模型通过在海量图像-标题对上进行文本条件生成训练，同时习得了自然图像统计特性的通用知识。然而，当面对约束采样任务时（例如基于已知左半部分生成图像的右半部分），应用这些模型成为一个精细且缓慢的过程：先前提出的算法依赖于昂贵的迭代操作，其速度通常比基于文本的推理慢数个数量级。这一现象有悖直觉，因为图像条件生成本应更少依赖难以学习的标题与图像间的语义关联，而应通过图像像素间的底层相关性实现。实践中，通常需针对每个逆问题单独训练或微调逆向模型（例如在训练中将图像局部作为附加条件），以使其适用于实际场景。但我们论证了这种单独训练并非必要，并提出一种适用于大型预训练扩散模型（Stable Diffusion）的快速约束采样算法。该方法无需通过模型进行昂贵的反向传播运算，且生成结果可与当前最先进的微调模型相媲美。我们的方法基于约束采样的新颖优化视角，采用数值逼近技术替代原先依赖反向传播计算的高成本梯度，从而实现了显著的加速效果。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日