We consider the problem of analyzing multivariate time series collected on multiple subjects, with the goal of identifying groups of subjects exhibiting similar trends in their recorded measurements over time as well as time-varying groups of associated measurements. To this end, we propose a Bayesian model for temporal biclustering featuring nested partitions, where a time-invariant partition of subjects induces a time-varying partition of measurements. Our approach allows for data-driven determination of the number of subject and measurement clusters as well as estimation of the number and location of changepoints in measurement partitions. To efficiently perform model fitting and posterior estimation with Markov Chain Monte Carlo, we derive a blocked update of measurements' cluster-assignment sequences. We illustrate the performance of our model in two applications to functional magnetic resonance imaging data and to an electroencephalogram dataset. The results indicate that the proposed model can combine information from potentially many subjects to discover a set of interpretable, dynamic patterns. Experiments on simulated data compare the estimation performance of the proposed model against ground-truth values and other statistical methods, showing that it performs well at identifying ground-truth subject and measurement clusters even when no subject or time dependence is present.
翻译:我们考虑分析从多个被试收集的多变量时间序列数据,旨在识别出在记录测量值随时间变化趋势上相似的被试群体,以及随时间变化的关联测量值分组。为此,我们提出了一种具有嵌套分区的贝叶斯时间双聚类模型,其中被试的时不变分区诱导出测量值的时变分区。我们的方法能够数据驱动地确定被试和测量值聚类的数量,并估计测量值分区中变点的数量和位置。为了使用马尔可夫链蒙特卡洛方法高效进行模型拟合和后验估计,我们推导了测量值聚类分配序列的块更新策略。我们在两个应用中展示了模型的性能:功能磁共振成像数据和脑电图数据集。结果表明,所提出的模型能够整合来自多个被试的信息,以发现一组可解释的动态模式。在模拟数据上的实验将所提出模型的估计性能与真实值及其他统计方法进行比较,结果显示即使在不存在被试或时间依赖性的情况下,该模型在识别真实被试和测量值聚类方面也表现良好。