While there is an immense literature on Bayesian methods for clustering, the multiview case has received little attention. This problem focuses on obtaining distinct but statistically dependent clusterings in a common set of entities for different data types. For example, clustering patients into subgroups with subgroup membership varying according to the domain of the patient variables. A challenge is how to model the across-view dependence between the partitions of patients into subgroups. The complexities of the partition space make standard methods to model dependence, such as correlation, infeasible. In this article, we propose CLustering with Independence Centering (CLIC), a clustering prior that uses a single parameter to explicitly model dependence between clusterings across views. CLIC is induced by the product centered Dirichlet process (PCDP), a novel hierarchical prior that bridges between independent and equivalent partitions. We show appealing theoretic properties, provide a finite approximation and prove its accuracy, present a marginal Gibbs sampler for posterior computation, and derive closed form expressions for the marginal and joint partition distributions for the CLIC model. On synthetic data and in an application to epidemiology, CLIC accurately characterizes view-specific partitions while providing inference on the dependence level.
翻译:尽管贝叶斯聚类方法已有大量文献,但多视图情况却鲜受关注。该问题聚焦于为同一组实体在不同数据类型上获得相互区别但统计相关的聚类结果。例如,将患者划分为不同亚组,其亚组成员关系随患者变量所属领域而变化。一个挑战在于如何对患者划分为亚组的分区结构之间的跨视图依赖性进行建模。分区空间的复杂性使得相关性等标准依赖建模方法不可行。本文提出独立性中心化聚类(CLIC),这是一种聚类先验,它使用单一参数显式建模跨视图聚类间的依赖性。CLIC由乘积中心狄利克雷过程(PCDP)诱导产生,这是一种新颖的分层先验,能在独立分区与等价分区之间建立桥梁。我们展示了该模型具有吸引人的理论性质,提供了有限近似并证明了其精确性,提出了用于后验计算的边际吉布斯采样器,并推导出CLIC模型边际与联合分区分布的闭式表达式。在合成数据和流行病学应用中,CLIC在准确刻画视图特异性分区的同时,实现了对依赖水平的推断。