Recent works have shown that diffusion models can learn essentially any distribution provided one can perform score estimation. Yet it remains poorly understood under what settings score estimation is possible, let alone when practical gradient-based algorithms for this task can provably succeed. In this work, we give the first provably efficient results along these lines for one of the most fundamental distribution families, Gaussian mixture models. We prove that gradient descent on the denoising diffusion probabilistic model (DDPM) objective can efficiently recover the ground truth parameters of the mixture model in the following two settings: 1) We show gradient descent with random initialization learns mixtures of two spherical Gaussians in $d$ dimensions with $1/\text{poly}(d)$-separated centers. 2) We show gradient descent with a warm start learns mixtures of $K$ spherical Gaussians with $\Omega(\sqrt{\log(\min(K,d))})$-separated centers. A key ingredient in our proofs is a new connection between score-based methods and two other approaches to distribution learning, the EM algorithm and spectral methods.
翻译:近期研究表明,扩散模型能够在具备分数估计能力的前提下学习任意分布。然而,分数估计在何种条件下可实现、尤其是基于梯度的实用算法能否在该任务上被证明成功,这些问题仍未被充分理解。本研究针对最基础的分布族之一——高斯混合模型,首次给出沿此方向的可证明高效结果。我们证明,在以下两种设定下,基于去噪扩散概率模型(DDPM)目标的梯度下降能高效恢复混合模型的真实参数:1) 对于$d$维空间中两个中心间距为$1/\text{poly}(d)$的球形高斯混合模型,随机初始化的梯度下降可学习该分布;2) 对于$K$个中心间距满足$\Omega(\sqrt{\log(\min(K,d))})$条件的球形高斯混合模型,暖启动的梯度下降可学习该分布。我们证明中的关键要素在于揭示了基于分数的方法与其他两种分布学习途径——EM算法与谱方法——之间的新联系。