Applying powerful generative denoising diffusion models (DDMs) for downstream tasks such as image semantic editing usually requires either fine-tuning pre-trained DDMs or learning auxiliary editing networks. In this work, we achieve SOTA semantic control performance on various application settings by optimizing the denoising trajectory solely via frozen DDMs. As one of the first optimization-based diffusion editing work, we start by seeking a more comprehensive understanding of the intermediate high-dimensional latent spaces by theoretically and empirically analyzing their probabilistic and geometric behaviors in the Markov chain. We then propose to further explore the critical step in the denoising trajectory that characterizes the convergence of a pre-trained DDM. Last but not least, we further present our method to search for the semantic subspaces boundaries for controllable manipulation, by guiding the denoising trajectory towards the targeted boundary at the critical convergent step. We conduct extensive experiments on various DPMs architectures (DDPM, iDDPM) and datasets (CelebA, CelebA-HQ, LSUN-church, LSUN-bedroom, AFHQ-dog) with different resolutions (64, 256) as empirical demonstrations.
翻译:将强大的生成式去噪扩散模型应用于图像语义编辑等下游任务时,通常需要对预训练扩散模型进行微调或学习辅助编辑网络。本研究通过仅优化冻结扩散模型的去噪轨迹,在多种应用场景中实现了最先进的语义控制性能。作为首批基于优化的扩散编辑工作之一,我们从理论和实证角度分析马尔可夫链中中间高维潜空间的概率与几何特性,以更全面地理解其本质。进一步地,我们探索去噪轨迹中表征预训练模型收敛性的关键步骤。最后,我们提出通过引导去噪轨迹朝向关键收敛步骤的对应语义子空间边界,从而实现可控操作的方法。我们在不同架构(DDPM、iDDPM)、数据集(CelebA、CelebA-HQ、LSUN-church、LSUN-bedroom、AFHQ-dog)及不同分辨率(64、256)上进行了广泛实验作为实证验证。