Classifier-free guidance (CFG) is crucial for improving both generation quality and alignment between the input condition and final output in diffusion models. While a high guidance scale is generally required to enhance these aspects, it also causes oversaturation and unrealistic artifacts. In this paper, we revisit the CFG update rule and introduce modifications to address this issue. We first decompose the update term in CFG into parallel and orthogonal components with respect to the conditional model prediction and observe that the parallel component primarily causes oversaturation, while the orthogonal component enhances image quality. Accordingly, we propose down-weighting the parallel component to achieve high-quality generations without oversaturation. Additionally, we draw a connection between CFG and gradient ascent and introduce a new rescaling and momentum method for the CFG update rule based on this insight. Our approach, termed adaptive projected guidance (APG), retains the quality-boosting advantages of CFG while enabling the use of higher guidance scales without oversaturation. APG is easy to implement and introduces practically no additional computational overhead to the sampling process. Through extensive experiments, we demonstrate that APG is compatible with various conditional diffusion models and samplers, leading to improved FID, recall, and saturation scores while maintaining precision comparable to CFG, making our method a superior plug-and-play alternative to standard classifier-free guidance.
翻译:在扩散模型中,无分类器引导(CFG)对于提升生成质量以及增强输入条件与最终输出之间的对齐至关重要。虽然通常需要较高的引导尺度来改善这些方面,但这也导致了过饱和和不真实的伪影。本文重新审视了CFG的更新规则,并引入修正以解决此问题。我们首先将CFG中的更新项分解为与条件模型预测平行和正交的分量,并观察到平行分量主要引起过饱和,而正交分量则提升图像质量。据此,我们提出对平行分量进行降权处理,以实现无过饱和的高质量生成。此外,我们建立了CFG与梯度上升之间的联系,并基于这一见解为CFG更新规则引入了一种新的重缩放与动量方法。我们的方法称为自适应投影引导(APG),它保留了CFG提升质量的优点,同时允许使用更高的引导尺度而不会产生过饱和。APG易于实现,且在实际采样过程中几乎不引入额外的计算开销。通过大量实验,我们证明APG与多种条件扩散模型和采样器兼容,能够改善FID、召回率和饱和度分数,同时保持与CFG相当的精确度,这使得我们的方法成为标准无分类器引导的一种优越的即插即用替代方案。