Robust clustering of high-dimensional data is an important topic because clusters in real datasets are often heavy-tailed and/or asymmetric. Traditional approaches to model-based clustering often fail for high dimensional data, e.g., due to the number of free covariance parameters. A parametrization of the component scale matrices for the mixture of generalized hyperbolic distributions is proposed. This parameterization includes a penalty term in the likelihood. An analytically feasible expectation-maximization algorithm is developed by placing a gamma-lasso penalty constraining the concentration matrix. The proposed methodology is investigated through simulation studies and illustrated using two real datasets.
翻译:高维数据的稳健聚类是一个重要课题,因为真实数据集中的聚类往往具有重尾和/或非对称性。传统的基于模型的聚类方法在处理高维数据时常常失效,例如由于自由协方差参数的数量过多。本文提出了一种针对广义双曲分布混合模型中分量尺度矩阵的参数化方法。该参数化在似然函数中引入了一个惩罚项。通过施加约束浓度矩阵的伽马-套索惩罚,开发了一种解析可行的期望最大化算法。通过模拟研究对所提出的方法进行了验证,并利用两个真实数据集进行了说明。