Towards Enhanced Controllability of Diffusion Models

Denoising Diffusion models have shown remarkable capabilities in generating realistic, high-quality and diverse images. However, the extent of controllability and editability with diffusion models is underexplored relative to GANs. Inspired by techniques based on the latent space of GAN models for image manipulation, we propose to train a diffusion model conditioned on two latent codes, a spatial content mask and a flattened style embedding. We rely on the inductive bias of the progressive denoising process of diffusion models to encode pose/layout information in the spatial structure mask and semantic/style information in the style code. We extend the sampling technique from composable diffusion models to allow for some dependence between conditional inputs. This improves the quality of the generations significantly while also providing control over the amount of guidance from each latent code separately as well as from their joint distribution. To further enhance controllability, we vary the level of guidance for structure and style latents based on the denoising timestep. We observe more controllability compared to existing methods and show that without explicit training objectives, diffusion models can be leveraged for effective image manipulation, reference based image translation and style transfer.

翻译：去噪扩散模型在生成逼真、高质量且多样化的图像方面展现了显著能力。然而，相较于生成对抗网络（GANs），扩散模型的可控性与可编辑性研究尚不充分。受基于GAN模型潜空间进行图像操控技术的启发，我们提出训练一个以两种潜编码（空间内容掩码和展平风格嵌入）为条件的扩散模型。我们利用扩散模型渐进去噪过程的归纳偏置，将姿态/布局信息编码至空间结构掩码中，将语义/风格信息编码至风格编码中。我们将可组合扩散模型的采样技术进行扩展，允许条件输入之间存在一定依赖性。这显著提升了生成质量，同时能够分别控制每个潜编码的引导强度以及它们的联合分布引导强度。为进一步增强可控性，我们根据去噪时间步动态调整结构和风格潜编码的引导水平。实验表明，与现有方法相比，我们获得了更强的可控性，并证明无需显式训练目标，扩散模型即可有效用于图像操控、基于参考的图像翻译以及风格迁移。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日