Neural Network-Based Score Estimation in Diffusion Models: Optimization and Generalization

Diffusion models have become a leading paradigm in generative AI, with score estimation via denoising score matching as a central component. While recent theory provides strong statistical guarantees, it typically relies on algorithm-agnostic assumptions and treats empirical risk minimization as if it were solved exactly. In practice, however, score functions are parameterized by highly nonconvex neural networks and trained by gradient descent (GD), and it remains unclear whether such practical procedures admit rigorous guarantees. We take a first step toward this question by developing a mathematical framework for score estimation with GD-trained neural networks. Our analysis addresses both optimization and generalization. We introduce a parametric formulation that reduces denoising score matching to a regression problem with noisy labels. This setting poses several challenges, including unbounded inputs, vector-valued outputs, and an additional time variable, which prevent a direct application of existing techniques. We show that, with a suitable design, the dynamics of GD-trained networks can be approximated by a sequence of localized kernel regression problems. We also show that prolonged training on noisy labels leads to overfitting, and derive an early-stopping rule adapted to unbounded domains. As a consequence, we establish the first minimax-optimal generalization bounds for GD-trained neural networks in diffusion models. Experiments on the Credit Default dataset further show that our theory-guided training framework achieves performance comparable to heavily tuned heuristic methods for generating high-fidelity financial tabular data.

翻译：扩散模型已成为生成式AI的主流范式，其中通过去噪分数匹配进行分数估算是核心组成部分。尽管近期理论提供了强有力的统计保证，但这些理论通常依赖于算法无关的假设，并将经验风险最小化视为精确求解的过程。然而在实际应用中，分数函数由高度非凸的神经网络参数化，并通过梯度下降（GD）进行训练，目前尚不清楚这类实际算法是否具有严格的理论保证。我们通过构建一个基于GD训练神经网络的分数估计数学框架，首次对该问题展开研究。我们的分析同时涵盖了优化与泛化两个方面。我们提出了一种参数化方法，将去噪分数匹配转化为带有噪声标签的回归问题。该设置面临多项挑战，包括无界输入、向量值输出以及额外的时间变量，这些因素导致现有技术无法直接应用。研究表明，通过合理设计，GD训练网络的动力学过程可近似为一系列局部化核回归问题。我们还发现，对噪声标签进行长时间训练会导致过拟合，并推导出适用于无界域的自适应早停规则。据此，我们建立了扩散模型中GD训练神经网络的第一个极小极大最优泛化界。在信用违约数据集上的实验进一步表明，我们理论指导的训练框架在生成高保真金融表格数据时，能够达到与精心调参的启发式方法相媲美的性能。