Diffusion probabilistic models (DPMs) have shown remarkable results on various image synthesis tasks such as text-to-image generation and image inpainting. However, compared to other generative methods like VAEs and GANs, DPMs lack a low-dimensional, interpretable, and well-decoupled latent code. Recently, diffusion autoencoders (Diff-AE) were proposed to explore the potential of DPMs for representation learning via autoencoding. Diff-AE provides an accessible latent space that exhibits remarkable interpretability, allowing us to manipulate image attributes based on latent codes from the space. However, previous works are not generic as they only operated on a few limited attributes. To further explore the latent space of Diff-AE and achieve a generic editing pipeline, we proposed a module called Group-supervised AutoEncoder(dubbed GAE) for Diff-AE to achieve better disentanglement on the latent code. Our proposed GAE has trained via an attribute-swap strategy to acquire the latent codes for multi-attribute image manipulation based on examples. We empirically demonstrate that our method enables multiple-attributes manipulation and achieves convincing sample quality and attribute alignments, while significantly reducing computational requirements compared to pixel-based approaches for representational decoupling. Code will be released soon.
翻译:扩散概率模型(DPMs)在文本生成图像、图像修复等多种图像合成任务中展现了卓越性能。然而,相较于变分自编码器(VAEs)和生成对抗网络(GANs)等其他生成方法,DPMs缺乏低维、可解释且充分解耦的潜变量编码。近期提出的扩散自编码器(Diff-AE)通过自编码机制探索了DPMs在表示学习中的潜力。Diff-AE提供了具有显著可解释性的可访问潜空间,使我们能基于该空间的潜变量编码操控图像属性。但现有方法仅适用于有限属性,缺乏通用性。为深入挖掘Diff-AE的潜空间并实现通用化编辑流程,我们提出了一个名为分组监督自编码器(GAE)的模块,通过优化Diff-AE的潜变量编码实现更优解耦。GAE采用属性交换策略进行训练,能基于示例样本获取多属性图像编辑所需的潜变量编码。实验表明,本方法支持多属性操控,在保证样本质量和属性对齐效果的同时,相较于基于像素的表示解耦方法显著降低了计算开销。代码即将开源。