Classifier-free guidance (CFG) is a widely used technique for controllable generation in diffusion and flow-based models. Despite its empirical success, CFG relies on a heuristic linear extrapolation that is often sensitive to the guidance scale. In this work, we provide a principled interpretation of CFG through the lens of optimization. We demonstrate that the velocity field in flow matching corresponds to the gradient of a sequence of smoothed distance functions, which guides latent variables toward the scaled target image set. This perspective reveals that the standard CFG formulation is an approximation of this gradient, where the prediction gap, the discrepancy between conditional and unconditional outputs, governs guidance sensitivity. Leveraging this insight, we reformulate the CFG sampling as a homotopy optimization with a manifold constraint. This formulation necessitates a manifold projection step, which we implement via an incremental gradient descent scheme during sampling. To improve computational efficiency and stability, we further enhance this iterative process with Anderson Acceleration without requiring additional model evaluations. Our proposed methods are training-free and consistently refine generation fidelity, prompt alignment, and robustness to the guidance scale. We validate their effectiveness across diverse benchmarks, demonstrating significant improvements on large-scale models such as DiT-XL-2-256, Flux, and Stable Diffusion 3.5.
翻译:无分类器引导(CFG)是扩散模型和基于流的模型中广泛使用的可控生成技术。尽管其经验上取得了成功,CFG依赖于一种启发式的线性外推方法,该方法通常对引导尺度较为敏感。在本工作中,我们通过优化的视角为CFG提供了一个原理性的解释。我们证明流匹配中的速度场对应一系列平滑距离函数的梯度,该梯度引导隐变量朝向缩放后的目标图像集。这一视角揭示了标准CFG公式是该梯度的近似,其中条件输出与无条件输出之间的预测差距主导了引导的敏感性。基于这一洞见,我们将CFG采样重新表述为带有流形约束的同伦优化问题。这一表述需要引入流形投影步骤,我们通过在采样过程中采用增量梯度下降方案来实现该步骤。为了提升计算效率和稳定性,我们进一步利用安德森加速法增强这一迭代过程,且无需额外的模型评估。我们提出的方法无需训练,并能持续提升生成保真度、提示对齐能力以及对引导尺度的鲁棒性。我们在多个基准测试中验证了其有效性,在DiT-XL-2-256、Flux和Stable Diffusion 3.5等大规模模型上均展示了显著的改进。