Model-based Clustering of Individuals' Ecological Momentary Assessment Time-series Data for Improving Forecasting Performance

Through Ecological Momentary Assessment (EMA) studies, a number of time-series data is collected across multiple individuals, continuously monitoring various items of emotional behavior. Such complex data is commonly analyzed in an individual level, using personalized models. However, it is believed that additional information of similar individuals is likely to enhance these models leading to better individuals' description. Thus, clustering is investigated with an aim to group together the most similar individuals, and subsequently use this information in group-based models in order to improve individuals' predictive performance. More specifically, two model-based clustering approaches are examined, where the first is using model-extracted parameters of personalized models, whereas the second is optimized on the model-based forecasting performance. Both methods are then analyzed using intrinsic clustering evaluation measures (e.g. Silhouette coefficients) as well as the performance of a downstream forecasting scheme, where each forecasting group-model is devoted to describe all individuals belonging to one cluster. Among these, clustering based on performance shows the best results, in terms of all examined evaluation measures. As another level of evaluation, those group-models' performance is compared to three baseline scenarios, the personalized, the all-in-one group and the random group-based concept. According to this comparison, the superiority of clustering-based methods is again confirmed, indicating that the utilization of group-based information could be effectively enhance the overall performance of all individuals' data.

翻译：通过生态瞬时评估（EMA）研究，研究者收集了大量来自多个个体的时间序列数据，持续监测情绪行为的各项指标。这类复杂数据通常采用个性化模型在个体层面进行分析。然而，研究表明，引入相似个体的额外信息可能增强这些模型，从而更准确地描述个体特征。因此，本研究探索聚类方法，旨在将最相似的个体归为同一群体，并利用群体模型中的信息提升个体的预测性能。具体而言，本文考察了两种基于模型的聚类方法：第一种使用个性化模型提取的参数进行聚类，第二种则基于模型预测性能进行优化。我们采用内在聚类评估指标（如轮廓系数）以及下游预测方案的性能，对两种方法进行分析——其中每个预测群体模型专门描述属于同一聚类的所有个体。结果表明，基于性能的聚类方法在所有评估指标上均表现最优。为进一步验证，我们将这些群体模型的性能与三种基线场景（个性化模型、统一群体模型和随机群体模型）进行对比。对比结果再次证实了基于聚类的聚类方法的优越性，表明利用群体信息可有效提升所有个体数据的整体预测性能。