Recent advances in diffusion-based generative models have shown incredible promise for Image-to-Image translation and editing. Most recent work in this space relies on additional training or architecture-specific adjustments to the diffusion process. In this work, we show that much of this low-level control can be achieved without additional training or any access to features of the diffusion model. Our method simply applies a filter to the input of each diffusion step based on the output of the previous step in an adaptive manner. Notably, this approach does not depend on any specific architecture or sampler and can be done without access to internal features of the network, making it easy to combine with other techniques, samplers, and diffusion architectures. Furthermore, it has negligible cost to performance, and allows for more continuous adjustment of guidance strength than other approaches. We show FGD offers a fast and strong baseline that is competitive with recent architecture-dependent approaches. Furthermore, FGD can also be used as a simple add-on to enhance the structural guidance of other state-of-the-art I2I methods. Finally, our derivation of this method helps to understand the impact of self attention, a key component of other recent architecture-specific I2I approaches, in a more architecture-independent way. Project page: https://github.com/jaclyngu/FilteredGuidedDiffusion
翻译:扩散生成模型的最新进展在图像到图像的翻译与编辑中展现了巨大潜力。该领域近期的大多数工作依赖于额外训练或针对扩散过程的架构特定调整。在本工作中,我们表明,无需额外训练或访问扩散模型的任何特征,即可实现此类低级控制。我们的方法仅基于前一步的输出,以自适应方式对每个扩散步骤的输入应用滤波。值得注意的是,该方法不依赖于任何特定架构或采样器,且无需访问网络的内部特征,因此易于与其他技术、采样器及扩散架构结合使用。此外,它在性能上几乎无开销,且允许比其他方法更连续地调整引导强度。我们展示了滤波引导扩散(FGD)提供了快速且强大的基线,可与近期依赖架构的方法相媲美。进一步,FGD可作为简单的附加组件,增强其他最先进图像到图像(I2I)方法的结构引导能力。最后,我们的方法推导有助于以更独立于架构的方式理解自注意力(近期其他特定架构I2I方法的关键组成部分)的影响。项目页面:https://github.com/jaclyngu/FilteredGuidedDiffusion