Improving Diffusion-Based Image Synthesis with Context Prediction

Diffusion models are a new class of generative models, and have dramatically promoted image generation with unprecedented quality and diversity. Existing diffusion models mainly try to reconstruct input image from a corrupted one with a pixel-wise or feature-wise constraint along spatial axes. However, such point-based reconstruction may fail to make each predicted pixel/feature fully preserve its neighborhood context, impairing diffusion-based image synthesis. As a powerful source of automatic supervisory signal, context has been well studied for learning representations. Inspired by this, we for the first time propose ConPreDiff to improve diffusion-based image synthesis with context prediction. We explicitly reinforce each point to predict its neighborhood context (i.e., multi-stride features/tokens/pixels) with a context decoder at the end of diffusion denoising blocks in training stage, and remove the decoder for inference. In this way, each point can better reconstruct itself by preserving its semantic connections with neighborhood context. This new paradigm of ConPreDiff can generalize to arbitrary discrete and continuous diffusion backbones without introducing extra parameters in sampling procedure. Extensive experiments are conducted on unconditional image generation, text-to-image generation and image inpainting tasks. Our ConPreDiff consistently outperforms previous methods and achieves a new SOTA text-to-image generation results on MS-COCO, with a zero-shot FID score of 6.21.

翻译：扩散模型是一类新型生成模型，凭借其前所未有的图像质量和多样性显著推动了图像生成领域的发展。现有扩散模型主要尝试通过沿空间轴施加像素级或特征级约束，从受损图像重建输入图像。然而，这种基于点的重建方式可能无法使每个预测像素/特征完整保留其邻域上下文信息，从而损害基于扩散模型的图像合成效果。上下文作为强大的自动监督信号源，在表征学习领域已得到广泛研究。受此启发，我们首次提出ConPreDiff方法，通过上下文预测改进基于扩散模型的图像合成。我们在训练阶段的扩散去噪块末端明确增强每个点对其邻域上下文（即多步长特征/标记/像素）的预测能力，并引入上下文解码器，在推理阶段移除该解码器。通过这种方式，每个点能通过保留与邻域上下文的语义连接实现更优自重建。这种ConPreDiff新范式可泛化至任意离散与连续扩散骨干网络，且在采样过程中不引入额外参数。在无条件图像生成、文本到图像生成及图像修复任务上进行了大量实验，我们的ConPreDiff始终优于以往方法，并在MS-COCO数据集上以6.21的零样本FID分数实现了新的文本到图像生成最先进水平。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日