Expectation-Maximization (EM) algorithm is a widely used iterative algorithm for computing (local) maximum likelihood estimate (MLE). It can be used in an extensive range of problems, including the clustering of data based on the Gaussian mixture model (GMM). Numerical instability and convergence problems may arise in situations where the sample size is not much larger than the data dimensionality. In such low sample support (LSS) settings, the covariance matrix update in the EM-GMM algorithm may become singular or poorly conditioned, causing the algorithm to crash. On the other hand, in many signal processing problems, a priori information can be available indicating certain structures for different cluster covariance matrices. In this paper, we present a regularized EM algorithm for GMM-s that can make efficient use of such prior knowledge as well as cope with LSS situations. The method aims to maximize a penalized GMM likelihood where regularized estimation may be used to ensure positive definiteness of covariance matrix updates and shrink the estimators towards some structured target covariance matrices. We show that the theoretical guarantees of convergence hold, leading to better performing EM algorithm for structured covariance matrix models or with low sample settings.
翻译:期望最大化(EM)算法是一种广泛使用的迭代算法,用于计算(局部)极大似然估计。它可应用于广泛的问题,包括基于高斯混合模型(GMM)的数据聚类。在样本量远大于数据维度的情况下,数值不稳定性和收敛问题可能会出现。在这种低样本支撑(LSS)设定下,EM-GMM算法中的协方差矩阵更新可能变得奇异或病态,导致算法崩溃。另一方面,在许多信号处理问题中,可能存在先验信息,表明不同聚类协方差矩阵具有特定结构。在本文中,我们提出了一种适用于GMM的正则化EM算法,该算法能有效利用此类先验知识,同时应对LSS情况。该方法旨在最大化惩罚GMM似然,其中正则化估计可用于确保协方差矩阵更新的正定性,并将估计量收缩至某些结构化目标协方差矩阵。我们证明了收敛性的理论保证成立,从而为结构化协方差矩阵模型或低样本设定提供了性能更优的EM算法。