What is a diffusion model actually doing when it turns noise into a photograph? We show that the deterministic DDIM reverse chain operates as a Partitioned Iterated Function System (PIFS) and that this framework serves as a unified design language for denoising diffusion model schedules, architectures, and training objectives. From the PIFS structure we derive three computable geometric quantities: a per-step contraction threshold $L^*_t$, a diagonal expansion function $f_t(λ)$ and a global expansion threshold $λ^{**}$. These quantities require no model evaluation and fully characterize the denoising dynamics. They structurally explain the two-regime behavior of diffusion models: global context assembly at high noise via diffuse cross-patch attention and fine-detail synthesis at low noise via patch-by-patch suppression release in strict variance order. Self-attention emerges as the natural primitive for PIFS contraction. The Kaplan-Yorke dimension of the PIFS attractor is determined analytically through a discrete Moran equation on the Lyapunov spectrum. Through the study of the fractal geometry of the PIFS, we derive three optimal design criteria and show that four prominent empirical design choices (the cosine schedule offset, resolution-dependent logSNR shift, Min-SNR loss weighting, and Align Your Steps sampling) each arise as approximate solutions to our explicit geometric optimization problems tuning theory into practice.
翻译:扩散模型将噪声转化为照片时究竟在做什么?我们证明确定性DDIM反向链作为分区迭代函数系统(PIFS)运行,且该框架可作为去噪扩散模型调度、架构与训练目标的统一设计语言。从PIFS结构中我们推导出三个可计算的几何量:每步收缩阈值$L^*_t$、对角扩张函数$f_t(λ)$与全局扩张阈值$λ^{**}$。这些量无需模型评估即可完全表征去噪动力学,从结构上解释了扩散模型的双阶段行为:高噪声时通过跨区块弥散注意力进行全局语境整合,低噪声时按严格方差顺序通过逐区块抑制释放进行精细细节合成。自注意力机制自然成为PIFS收缩的基本算子。PIFS吸引子的Kaplan-Yorke维度通过Lyapunov谱上的离散Moran方程解析确定。通过对PIFS分形几何的研究,我们推导出三项最优设计准则,并证明四项重要经验设计选择(余弦调度偏移、分辨率相关对数信噪比平移、Min-SNR损失加权与Align Your Steps采样)均作为我们显式几何优化问题的近似解出现,从而将理论转化为实践。