Diffusion models are proficient at generating high-quality images. They are however effective only when operating at the resolution used during training. Inference at a scaled resolution leads to repetitive patterns and structural distortions. Retraining at higher resolutions quickly becomes prohibitive. Thus, methods enabling pre-existing diffusion models to operate at flexible test-time resolutions are highly desirable. Previous works suffer from frequent artifacts and often introduce large latency overheads. We propose two simple modules that combine to solve these issues. We introduce a Frequency Modulation (FM) module that leverages the Fourier domain to improve the global structure consistency, and an Attention Modulation (AM) module which improves the consistency of local texture patterns, a problem largely ignored in prior works. Our method, coined Fam diffusion, can seamlessly integrate into any latent diffusion model and requires no additional training. Extensive qualitative results highlight the effectiveness of our method in addressing structural and local artifacts, while quantitative results show state-of-the-art performance. Also, our method avoids redundant inference tricks for improved consistency such as patch-based or progressive generation, leading to negligible latency overheads.
翻译:扩散模型在生成高质量图像方面表现出色。然而,它们仅在训练所用分辨率下运行时才有效。在缩放分辨率下进行推理会导致重复模式和结构失真。在更高分辨率下重新训练的计算成本很快变得难以承受。因此,使现有扩散模型能够在灵活测试分辨率下运行的方法具有重要价值。先前方法常出现伪影问题,且通常引入较高的延迟开销。我们提出了两个简单模块协同解决这些问题:频率调制模块通过傅里叶域提升全局结构一致性,注意力调制模块则改善局部纹理模式的一致性——该问题在先前工作中长期被忽视。我们提出的Fam扩散方法可无缝集成到任何潜在扩散模型中,且无需额外训练。大量定性结果证明了本方法在消除结构性与局部伪影方面的有效性,定量实验则显示其达到最先进性能。同时,本方法避免了基于分块或渐进生成等提升一致性的冗余推理技巧,实现了可忽略的延迟开销。