In this article, we consider the problem of clustering multi-view data, that is, information associated to individuals that form heterogeneous data sources (the views). We adopt a Bayesian model and in the prior structure we assume that each individual belongs to a baseline cluster and conditionally allow each individual in each view to potentially belong to different clusters than the baseline. We call such a structure ''latent modularity''. Then for each cluster, in each view we have a specific statistical model with an associated prior. We derive expressions for the marginal priors on the view-specific cluster labels and the associated partitions, giving several insights into our chosen prior structure. Using simple Markov chain Monte Carlo algorithms, we consider our model in a simulation study, along with a more detailed case study that requires several modeling innovations.
翻译:本文探讨多视图数据的聚类问题,即与个体关联的、来自异构数据源(视图)的信息。我们采用贝叶斯模型,在先验结构中假设每个个体属于一个基线簇,并条件性地允许每个个体在每个视图中可能属于不同于基线的簇。我们将此结构称为“潜在模块性”。随后,针对每个视图中的每个簇,我们设定具有关联先验的特定统计模型。我们推导了视图特定簇标签及其关联划分的边缘先验表达式,从而深入阐释所选先验结构的特性。通过简单的马尔可夫链蒙特卡罗算法,我们在模拟研究中验证了该模型,并结合一项需要多项建模创新的详细案例研究进行了深入分析。