Diffusion models do not recover semantic structure uniformly over time. Instead, samples transition from semantic ambiguity to class commitment within a narrow regime. Recent theoretical work attributes this transition to dynamical instabilities along class-separating directions, but practical methods to detect and exploit these windows in trained models are still limited. We show that tracking the class-conditional entropy of a latent semantic variable given the noisy state provides a reliable signature of these transition regimes. By restricting the entropy to semantic partitions, the entropy can furthermore resolve semantic decisions at different levels of abstraction. We analyze this behavior in high-dimensional Gaussian mixture models and show that the entropy rate concentrates on the same logarithmic time scale as the speciation symmetry-breaking instability previously identified in variance-preserving diffusion. We validate our method on EDM2-XS and Stable Diffusion 1.5, where class-conditional entropy consistently isolates the noise regimes critical for semantic structure formation. Finally, we use our framework to quantify how guidance redistributes semantic information over time. Together, these results connect information-theoretic and statistical physics perspectives on diffusion and provide a principled basis for time-localized control.
翻译:扩散模型在时间维度上并非均匀地恢复语义结构。相反,样本会在一个狭窄的区间内从语义模糊状态过渡到类别确定状态。近期的理论工作将此过渡归因于沿类别分离方向的动力学不稳定性,但在已训练模型中检测并利用这些时间窗口的实用方法仍然有限。我们证明,通过追踪给定噪声状态时潜在语义变量的类别条件熵,能够可靠地标识这些过渡区间。通过将熵限制在语义分区内,该熵值还能解析不同抽象层次的语义决策。我们在高维高斯混合模型中分析了这一行为,并证明熵率集中在与先前在方差保持扩散中识别的分化对称破缺不稳定性相同的对数时间尺度上。我们在EDM2-XS和Stable Diffusion 1.5上验证了本方法,其中类别条件熵能一致地分离出对语义结构形成至关重要的噪声区间。最后,我们利用该框架量化了引导技术如何随时间重新分配语义信息。这些结果共同建立了关于扩散模型的信息论与统计物理学视角之间的联系,并为时间局部化控制提供了理论基础。