Score-based methods, such as diffusion models and Bayesian inverse problems, are often interpreted as learning the data distribution in the low-noise limit ($σ\to 0$). In this work, we propose an alternative perspective: their success arises from implicitly learning the data manifold rather than the full distribution. Our claim is based on a novel analysis of scores in the small-$σ$ regime that reveals a sharp separation of scales: information about the data manifold is $Θ(σ^{-2})$ stronger than information about the distribution. We argue that this insight suggests a paradigm shift from the less practical goal of distributional learning to the more attainable task of geometric learning, which provably tolerates $O(σ^{-2})$ larger errors in score approximation. We illustrate this perspective through three consequences: i) in diffusion models, concentration on data support can be achieved with a score error of $o(σ^{-2})$, whereas recovering the specific data distribution requires a much stricter $o(1)$ error; ii) more surprisingly, learning the uniform distribution on the manifold-an especially structured and useful object-is also $O(σ^{-2})$ easier; and iii) in Bayesian inverse problems, the maximum entropy prior is $O(σ^{-2})$ more robust to score errors than generic priors. Finally, we validate our theoretical findings with preliminary experiments on large-scale models, including Stable Diffusion.
翻译:基于分数的方法,如扩散模型和贝叶斯逆问题,通常被解释为在低噪声极限($σ\to 0$)下学习数据分布。在这项工作中,我们提出了一个替代视角:它们的成功源于隐式地学习数据流形,而非完整的分布。我们的主张基于对分数在小$σ$区域的新颖分析,该分析揭示了一个显著的尺度分离:关于数据流形的信息比关于分布的信息强$Θ(σ^{-2})$。我们认为,这一见解暗示了从实用性较差的分布学习目标,向更易实现的几何学习任务的范式转变,后者在理论上可容忍$O(σ^{-2})$倍的分数近似误差。我们通过三个推论来阐述这一观点:i)在扩散模型中,数据支撑上的集中可以通过$o(σ^{-2})$的分数误差实现,而要恢复特定的数据分布则需要严格得多的$o(1)$误差;ii)更令人惊讶的是,学习流形上的均匀分布——一个特别结构化且有用的对象——同样也$O(σ^{-2})$倍更容易;iii)在贝叶斯逆问题中,最大熵先验比一般先验对分数误差的鲁棒性高$O(σ^{-2})$倍。最后,我们通过在包括Stable Diffusion在内的大规模模型上的初步实验验证了我们的理论发现。