Guidance is an error-correcting technique used to improve the perceptual quality of images generated by diffusion models. Typically, the correction is achieved by linear extrapolation, using an auxiliary diffusion model that has lower performance than the primary model. Using a 2D toy example, we show that it is highly beneficial when the auxiliary model exhibits similar errors as the primary one but stronger. We verify this finding in higher dimensions, where we show that competitive generative performance to state-of-the-art guidance methods can be achieved when the auxiliary model differs from the primary one only by having stronger weight regularization. As an independent contribution, we investigate whether upweighting long-range spatial dependencies improves visual fidelity. The result is a novel guidance method, which we call sliding window guidance (SWG), that guides the primary model with itself by constraining its receptive field. Intriguingly, SWG aligns better with human preferences than state-of-the-art guidance methods while requiring neither training, architectural modifications, nor class conditioning. The code will be released.
翻译:引导是一种用于提升扩散模型生成图像感知质量的纠错技术。通常,该校正通过线性外推法实现,使用一个性能低于主模型的辅助扩散模型。通过一个二维玩具示例,我们证明当辅助模型表现出与主模型相似但更强的误差时,该技术极为有益。我们在更高维度上验证了这一发现,结果表明:当辅助模型与主模型仅在权重正则化强度上存在差异时,即可达到与最先进引导方法相竞争的生成性能。作为一项独立贡献,我们研究了增强长程空间依赖性是否能够提升视觉保真度。其结果是一种新颖的引导方法,我们称之为滑动窗口引导(SWG),该方法通过约束主模型的感受野,使其利用自身进行引导。有趣的是,SWG与人类偏好的对齐度优于最先进的引导方法,且无需额外训练、架构修改或类别条件限制。代码将公开释放。