This paper addresses the statistical estimation of Gaussian Mixture Models (GMMs) with unknown diagonal covariances from independent and identically distributed samples. We employ the Beurling-LASSO (BLASSO), a convex optimization framework that promotes sparsity in the space of measures, to simultaneously estimate the number of components and their parameters. Our main contribution extends the BLASSO methodology to multivariate GMMs with component-specific unknown diagonal covariance matrices. This setting is significantly more flexible than previous approaches, which required known and identical covariances. We establish non-asymptotic recovery guarantees with nearly parametric convergence rates for component means, diagonal covariances, and weights, as well as for density prediction. A key theoretical contribution is the identification of an explicit separation condition on mixture components that enables the construction of non-degenerate dual certificates-essential tools for establishing statistical guarantees for the BLASSO. Our analysis leverages the Fisher-Rao geometry of the statistical model and introduces a novel semi-distance adapted to our framework, providing new insights into the interplay between component separation, parameter space geometry, and achievable statistical recovery.
翻译:本文研究了在独立同分布样本下,对具有未知对角协方差的高斯混合模型进行统计估计的问题。我们采用Beurling-LASSO(BLASSO)这一凸优化框架——该框架可促进测度空间中的稀疏性——来同时估计混合成分的数量及其参数。本文的主要贡献在于将BLASSO方法扩展至具有成分特异性未知对角协方差矩阵的多变量GMM。与先前要求协方差已知且恒定的方法相比,本设置显著提升了灵活性。我们建立了非渐近恢复保证,使得成分均值、对角协方差、权重以及密度预测均能达到近乎参数的收敛速率。一项关键的理论贡献是识别出混合成分间显式的分离条件,该条件能够构造非退化对偶证书——这是为BLASSO建立统计保证的重要工具。我们的分析利用了统计模型的Fisher-Rao几何结构,并引入了一种适应本文框架的新型半距离,为理解成分分离度、参数空间几何结构与可实现的统计恢复之间的相互作用提供了新见解。