The purpose of this paper is twofold. First, we propose a novel algorithm for estimating parameters in one-dimensional Gaussian mixture models (GMMs). The algorithm takes advantage of the Hankel structure inherent in the Fourier data obtained from independent and identically distributed (i.i.d) samples of the mixture. For GMMs with a unified variance, a singular value ratio functional using the Fourier data is introduced and used to resolve the variance and component number simultaneously. The consistency of the estimator is derived. Compared to classic algorithms such as the method of moments and the maximum likelihood method, the proposed algorithm does not require prior knowledge of the number of Gaussian components or good initial guesses. Numerical experiments demonstrate its superior performance in estimation accuracy and computational cost. Second, we reveal that there exists a fundamental limit to the problem of estimating the number of Gaussian components or model order in the mixture model if the number of i.i.d samples is finite. For the case of a single variance, we show that the model order can be successfully estimated only if the minimum separation distance between the component means exceeds a certain threshold value and can fail if below. We derive a lower bound for this threshold value, referred to as the computational resolution limit, in terms of the number of i.i.d samples, the variance, and the number of Gaussian components. Numerical experiments confirm this phase transition phenomenon in estimating the model order. Moreover, we demonstrate that our algorithm achieves better scores in likelihood, AIC, and BIC when compared to the EM algorithm.
翻译:本文目的有二。首先,我们提出了一种估计一维高斯混合模型(GMM)参数的新算法。该算法利用了从混合模型独立同分布(i.i.d)样本获得的傅里叶数据所固有的汉克尔结构。对于具有统一方差的高斯混合模型,我们引入了一种基于傅里叶数据的奇异值比函数,并用其同时求解方差和分量数目。我们推导了该估计量的一致性。与矩估计法和最大似然法等经典算法相比,所提算法无需预先知晓高斯分量数目或良好的初始猜测。数值实验证明了其在估计精度和计算成本方面的优越性能。其次,我们揭示了当独立同分布样本数量有限时,混合模型中高斯分量数目(即模型阶数)估计问题存在根本性极限。对于单一方差情形,我们证明只有当分量均值间的最小分离距离超过某一阈值时,模型阶数才能成功估计,否则将失败。我们将该阈值(称为计算分辨率极限)的下界表示为独立同分布样本数、方差和高斯分量数目的函数。数值实验证实了模型阶数估计中的这一相变现象。此外,我们证明所提算法在似然函数、AIC和BIC准则上均优于EM算法。