Originating from the diffusion phenomenon in physics that describes particle movement, the diffusion generative models inherit the characteristics of stochastic random walk in the data space along the denoising trajectory. However, the intrinsic mutual interference among image regions contradicts the need for practical downstream application scenarios where the preservation of low-level pixel information from given conditioning is desired (e.g., customization tasks like personalized generation and inpainting based on a user-provided single image). In this work, we investigate the diffusion (physics) in diffusion (machine learning) properties and propose our Cyclic One-Way Diffusion (COW) method to control the direction of diffusion phenomenon given a pre-trained frozen diffusion model for versatile customization application scenarios, where the low-level pixel information from the conditioning needs to be preserved. Notably, unlike most current methods that incorporate additional conditions by fine-tuning the base text-to-image diffusion model or learning auxiliary networks, our method provides a novel perspective to understand the task needs and is applicable to a wider range of customization scenarios in a learning-free manner. Extensive experiment results show that our proposed COW can achieve more flexible customization based on strict visual conditions in different application settings.
翻译:源于物理学中描述粒子运动的扩散现象,扩散生成模型继承了数据空间中沿去噪轨迹进行随机游走的特性。然而,图像区域间固有的相互干扰与需要保留给定条件中底层像素信息的实际下游应用场景(例如基于用户提供的单张图像进行个性化生成和修复等定制化任务)相矛盾。本研究探索了扩散(物理学)在扩散(机器学习)中的特性,并提出循环单向扩散(Cyclic One-Way Diffusion, COW)方法,在给定预训练冻结扩散模型的前提下控制扩散现象的方向,适用于需要保留条件中底层像素信息的多功能定制化应用场景。值得注意的是,不同于当前大多数通过微调基础文本-图像扩散模型或学习辅助网络来引入额外条件的方法,本方法提供了理解任务需求的全新视角,能以无需学习的方式适用于更广泛的定制化场景。大量实验结果表明,所提出的COW方法能在不同应用场景中基于严格的视觉条件实现更灵活的定制化。