Guidance is an error-correcting technique used to improve the perceptual quality of images generated by diffusion models. Typically, the correction is achieved by linear extrapolation, using an auxiliary diffusion model that has lower performance than the primary model. Using a 2D toy example, we show that it is highly beneficial when the auxiliary model exhibits similar errors as the primary one but stronger. We verify this finding in higher dimensions, where we show that competitive generative performance to state-of-the-art guidance methods can be achieved when the auxiliary model differs from the primary one only by having stronger weight regularization. As an independent contribution, we investigate whether upweighting long-range spatial dependencies improves visual fidelity. The result is a novel guidance method, which we call sliding window guidance (SWG), that guides the primary model with itself by constraining its receptive field. Intriguingly, SWG aligns better with human preferences than state-of-the-art guidance methods while requiring neither training, architectural modifications, nor class conditioning. The code will be released.
翻译:引导是一种用于提升扩散模型生成图像感知质量的误差校正技术。通常,校正通过线性外推实现,使用一个性能低于主模型的辅助扩散模型。通过一个二维玩具示例,我们证明当辅助模型表现出与主模型相似但更强的误差时,该技术具有显著优势。我们在更高维度上验证了这一发现,结果表明当辅助模型仅通过更强的权重正则化与主模型区分时,即可实现与最先进引导方法相竞争的生成性能。作为一项独立贡献,我们研究了增强长程空间依赖性是否会改善视觉保真度。其结果是一种新颖的引导方法,我们称之为滑动窗口引导(SWG),该方法通过约束主模型的感受野,使其利用自身进行引导。有趣的是,SWG比最先进的引导方法更符合人类偏好,同时既不需要训练、架构修改,也不需要类别条件。代码将予以发布。