Recent advances in diffusion-based generative models have shown incredible promise for zero shot image-to-image translation and editing. Most of these approaches work by combining or replacing network-specific features used in the generation of new images with those taken from the inversion of some guide image. Methods of this type are considered the current state-of-the-art in training-free approaches, but have some notable limitations: they tend to be costly in runtime and memory, and often depend on deterministic sampling that limits variation in generated results. We propose Filter-Guided Diffusion (FGD), an alternative approach that leverages fast filtering operations during the diffusion process to support finer control over the strength and frequencies of guidance and can work with non-deterministic samplers to produce greater variety. With its efficiency, FGD can be sampled over multiple seeds and hyperparameters in less time than a single run of other SOTA methods to produce superior results based on structural and semantic metrics. We conduct extensive quantitative and qualitative experiments to evaluate the performance of FGD in translation tasks and also demonstrate its potential in localized editing when used with masks. Project page: https://filterguideddiffusion.github.io/
翻译:基于扩散的生成模型在零样本图像到图像转换与编辑方面展现出巨大潜力。当前主流方法通常通过组合或替换生成新图像时使用的网络特定特征来实现,这些特征来源于对引导图像的反演过程。此类方法被视为当前无需训练方法的最先进技术,但仍存在明显局限:计算时间和内存消耗较大,且常依赖于确定性采样,限制了生成结果的多样性。本文提出滤波器引导扩散方法,该方案在扩散过程中利用快速滤波操作,实现对引导强度和频率的更精细控制,并能与随机采样器协同工作以产生更丰富的输出变体。凭借其高效性,FGD可在少于其他最先进方法单次运行的时间内,对多种随机种子和超参数进行采样,从而在结构和语义指标上获得更优结果。我们通过大量定量与定性实验评估FGD在图像转换任务中的性能,并展示其与掩码结合时在局部编辑中的应用潜力。项目页面:https://filterguideddiffusion.github.io/