The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning

Autonomous (noise-agnostic) generative models, such as Equilibrium Matching and blind diffusion, challenge the standard paradigm by learning a single, time-invariant vector field that operates without explicit noise-level conditioning. While recent work suggests that high-dimensional concentration allows these models to implicitly estimate noise levels from corrupted observations, a fundamental paradox remains: what is the underlying landscape being optimized when the noise level is treated as a random variable, and how can a bounded, noise-agnostic network remain stable near the data manifold where gradients typically diverge? We resolve this paradox by formalizing Marginal Energy, $E_{\text{marg}}(\mathbf{u}) = -\log p(\mathbf{u})$, where $p(\mathbf{u}) = \int p(\mathbf{u}|t)p(t)dt$ is the marginal density of the noisy data integrated over a prior distribution of unknown noise levels. We prove that generation using autonomous models is not merely blind denoising, but a specific form of Riemannian gradient flow on this Marginal Energy. Through a novel relative energy decomposition, we demonstrate that while the raw Marginal Energy landscape possesses a $1/t^p$ singularity normal to the data manifold, the learned time-invariant field implicitly incorporates a local conformal metric that perfectly counteracts the geometric singularity, converting an infinitely deep potential well into a stable attractor. We also establish the structural stability conditions for sampling with autonomous models. We identify a ``Jensen Gap'' in noise-prediction parameterizations that acts as a high-gain amplifier for estimation errors, explaining the catastrophic failure observed in deterministic blind models. Conversely, we prove that velocity-based parameterizations are inherently stable because they satisfy a bounded-gain condition that absorbs posterior uncertainty into a smooth geometric drift.

翻译：自主（噪声不可知）生成模型，如均衡匹配与盲扩散，通过学习单一、时不变且无需显式噪声水平条件化的向量场，对标准范式提出了挑战。尽管近期研究指出高维集中性使得这些模型能够从含噪观测中隐式估计噪声水平，但一个根本性悖论依然存在：当噪声水平被视为随机变量时，被优化的底层景观是什么？一个有界的、噪声不可知的网络如何在梯度通常发散的数据流形附近保持稳定？我们通过形式化边际能量 $E_{\text{marg}}(\mathbf{u}) = -\log p(\mathbf{u})$ 来解析这一悖论，其中 $p(\mathbf{u}) = \int p(\mathbf{u}|t)p(t)dt$ 是含噪数据在未知噪声水平的先验分布上积分得到的边际密度。我们证明，使用自主模型的生成过程不仅仅是盲去噪，而是该边际能量上的一种特定形式的黎曼梯度流。通过一种新颖的相对能量分解，我们证明尽管原始边际能量景观在垂直于数据流形的方向上具有 $1/t^p$ 奇异性，但所学的时不变场隐式地融入了一个局部共形度量，该度量完美抵消了几何奇异性，将无限深的势阱转化为稳定的吸引子。我们还建立了使用自主模型进行采样的结构稳定性条件。我们识别出噪声预测参数化中存在一个“詹森间隙”，该间隙作为估计误差的高增益放大器，解释了在确定性盲模型中观察到的灾难性失败。相反，我们证明了基于速度的参数化本质上是稳定的，因为它们满足一个有界增益条件，该条件将后验不确定性吸收为平滑的几何漂移。