Many common clustering methods cannot be used for clustering multivariate longitudinal data in cases where variables exhibit high autocorrelations. In this article, a copula kernel mixture model (CKMM) is proposed for clustering data of this type. The CKMM is a finite mixture model which decomposes each mixture component's joint density function into its copula and marginal distribution functions. In this decomposition, the Gaussian copula is used due to its mathematical tractability and Gaussian kernel functions are used to estimate the marginal distributions. A generalized expectation-maximization algorithm is used to estimate the model parameters. The performance of the proposed model is assessed in a simulation study and on two real datasets. The proposed model is shown to have effective performance in comparison to standard methods, such as K-means with dynamic time warping clustering and latent growth models.
翻译:许多常见的聚类方法无法适用于多元纵向数据的聚类,尤其是在变量呈现高度自相关的情况下。本文提出了一种Copula核混合模型(CKMM)用于此类数据的聚类。CKMM是一种有限混合模型,它将每个混合分量的联合密度函数分解为Copula函数和边缘分布函数。在该分解中,由于高斯Copula的数学易处理性而被采用,同时使用高斯核函数来估计边缘分布。采用广义期望最大化算法估计模型参数。通过模拟研究及两个真实数据集评估了所提模型的性能。结果表明,与动态时间规整K-means聚类和潜增长模型等标准方法相比,该模型具有更优的聚类效果。