同时实现图像归零与噪声生成：基于解析图像衰减的扩散模型 (Simultaneous Image-to-Zero and Zero-to-Noise: Diffusion Models with Analytical Image Attenuation)

Recent studies have demonstrated that the forward diffusion process is crucial for the effectiveness of diffusion models in terms of generative quality and sampling efficiency. We propose incorporating an analytical image attenuation process into the forward diffusion process for high-quality (un)conditioned image generation with significantly fewer denoising steps compared to the vanilla diffusion model requiring thousands of steps. In a nutshell, our method represents the forward image-to-noise mapping as simultaneous \textit{image-to-zero} mapping and \textit{zero-to-noise} mapping. Under this framework, we mathematically derive 1) the training objectives and 2) for the reverse time the sampling formula based on an analytical attenuation function which models image to zero mapping. The former enables our method to learn noise and image components simultaneously which simplifies learning. Importantly, because of the latter's analyticity in the \textit{zero-to-image} sampling function, we can avoid the ordinary differential equation-based accelerators and instead naturally perform sampling with an arbitrary step size. We have conducted extensive experiments on unconditioned image generation, \textit{e.g.}, CIFAR-10 and CelebA-HQ-256, and image-conditioned downstream tasks such as super-resolution, saliency detection, edge detection, and image inpainting. The proposed diffusion models achieve competitive generative quality with much fewer denoising steps compared to the state of the art, thus greatly accelerating the generation speed. In particular, to generate images of comparable quality, our models require only one-twentieth of the denoising steps compared to the baseline denoising diffusion probabilistic models. Moreover, we achieve state-of-the-art performances on the image-conditioned tasks using only no more than 10 steps.

翻译：近期研究表明，前向扩散过程对于扩散模型的生成质量和采样效率至关重要。本文提出在前向扩散过程中引入解析图像衰减过程，以实现高质量（非）条件图像生成，其所需去噪步骤数较传统扩散模型的数千步显著减少。简而言之，我们的方法将前向图像到噪声的映射表示为同步的\textit{图像归零}映射与\textit{零到噪声}映射。在此框架下，我们通过数学推导得到：1）基于建模图像归零映射的解析衰减函数的训练目标；2）反向时间的采样公式。前者使我们的方法能够同时学习噪声和图像分量，从而简化学习过程。更重要的是，由于后者在\textit{零到图像}采样函数中具有解析性，我们可以避免基于常微分方程的加速器，转而自然地实现任意步长的采样。我们在无条件图像生成（如CIFAR-10和CelebA-HQ-256数据集）以及图像条件下游任务（包括超分辨率、显著性检测、边缘检测和图像修复）上进行了大量实验。与现有技术相比，所提出的扩散模型以更少的去噪步骤实现了具有竞争力的生成质量，从而极大加速了生成速度。特别值得注意的是，为生成质量相当的图像，我们的模型仅需基线去噪扩散概率模型二十分之一的去噪步骤。此外，我们在图像条件任务中仅使用不超过10步的采样就达到了最先进的性能水平。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日