Classifier-free guidance (CFG) is the de facto standard for conditional sampling in diffusion models, yet it often leads to a loss of diversity in generated samples. We formalize this phenomenon as generative distortion, defined as the mismatch between the CFG-induced sampling distribution and the true conditional distribution. Considering Gaussian mixtures and their exact scores, and leveraging tools from statistical physics, we characterize the onset of distortion in a high-dimensional regime as a function of the number of classes. Our analysis reveals that distortions emerge through a phase transition in the effective potential governing the guided dynamics. In particular, our dynamical mean-field analysis shows that distortion persists when the number of modes grows exponentially with dimension, but vanishes in the sub-exponential regime. Consistent with prior finite-dimensional results, we further demonstrate that vanilla CFG shifts the mean and shrinks the variance of the conditional distribution. We show that standard CFG schedules are fundamentally incapable of preventing variance shrinkage. Finally, we propose a theoretically motivated guidance schedule featuring a negative-guidance window, which mitigates loss of diversity while preserving class separability.
翻译:分类器无关引导(CFG)是扩散模型中条件采样的实际标准,但它常常导致生成样本的多样性降低。我们将这一现象形式化为生成失真,定义为CFG诱导的采样分布与真实条件分布之间的失配。考虑高斯混合模型及其精确得分函数,并借助统计物理学的工具,我们在高维区域中将失真发生的临界点刻画为类别数量的函数。我们的分析表明,失真现象是通过支配引导动力学的有效势发生相变而出现的。具体而言,我们的动态平均场分析显示,当模态数量随维度呈指数增长时失真持续存在,但在亚指数增长区域中消失。与先前的有限维结果一致,我们进一步证明原始CFG会平移条件分布的均值并收缩其方差。我们指出标准CFG调度方案本质上无法防止方差收缩。最后,我们提出一种具有理论依据的引导调度方案,该方案包含负引导窗口,能够在保持类别可分性的同时缓解多样性损失。