In social sciences, studies are often based on questionnaires asking participants to express ordered responses several times over a study period. We present a model-based clustering algorithm for such longitudinal ordinal data. Assuming that an ordinal variable is the discretization of a underlying latent continuous variable, the model relies on a mixture of matrix-variate normal distributions, accounting simultaneously for within- and between-time dependence structures. The model is thus able to concurrently model the heterogeneity, the association among the responses and the temporal dependence structure. An EM algorithm is developed and presented for parameters estimation. An evaluation of the model through synthetic data shows its estimation abilities and its advantages when compared to competitors. A real-world application concerning changes in eating behaviours during the Covid-19 pandemic period in France will be presented.
翻译:在社会科学研究中,研究常基于问卷要求参与者在研究期间多次表达有序响应。我们提出一种针对此类纵向有序数据的模型聚类算法。假设有序变量是潜在连续变量的离散化结果,该模型基于矩阵变元正态分布的混合模型,同时考虑时间内部与时间之间的依赖结构。因此该模型能够同时建模异质性、响应间的关联性以及时间依赖结构。我们开发并阐述了用于参数估计的EM算法。通过合成数据进行的模型评估展示了其估计能力及相较于竞争模型的优势。还将介绍一项关于法国新冠疫情期间饮食行为变化的实际应用案例。