Mixtures of matrix Gaussian distributions provide a probabilistic framework for clustering continuous matrix-variate data, which are becoming increasingly prevalent in various fields. Despite its widespread adoption and successful application, this approach suffers from over-parameterization issues, making it less suitable even for matrix-variate data of moderate size. To overcome this drawback, we introduce a sparse model-based clustering approach for three-way data. Our approach assumes that the matrix mixture parameters are sparse and have different degree of sparsity across clusters, allowing to induce parsimony in a flexible manner. Estimation of the model relies on the maximization of a penalized likelihood, with specifically tailored group and graphical lasso penalties. These penalties enable the selection of the most informative features for clustering three-way data where variables are recorded over multiple occasions and allow to capture cluster-specific association structures. The proposed methodology is tested extensively on synthetic data and its validity is demonstrated in application to time-dependent crime patterns in different US cities.
翻译:矩阵高斯分布混合为连续矩阵变量数据的聚类提供了概率框架,这类数据在各领域中日益普遍。尽管该方法已被广泛采用并成功应用,但存在过度参数化问题,即使对于中等规模的矩阵变量数据也不够适用。为克服这一缺陷,我们提出了一种面向三维数据的稀疏模型聚类方法。该方法假设矩阵混合参数具有稀疏性,且不同聚类间稀疏程度存在差异,从而以灵活方式实现简约性。模型估计基于惩罚似然最大化,并采用针对性设计的组lasso惩罚与图lasso惩罚。这些惩罚能够从多时段记录的变量中筛选出最具信息量的特征用于三维数据聚类,同时捕获聚类特定的关联结构。所提方法在合成数据上进行了广泛测试,并通过美国不同城市时间相关犯罪模式的实证研究验证了其有效性。