We consider dependent clustering of observations in groups. The proposed model, called the plaid atoms model (PAM), estimates a set of clusters for each group and allows some clusters to be either shared with other groups or uniquely possessed by the group. PAM is based on an extension to the well-known stick-breaking process by adding zero as a possible value for the cluster weights, resulting in a zero-augmented beta (ZAB) distribution in the model. As a result, ZAB allows some cluster weights to be exactly zero in multiple groups, thereby enabling shared and unique atoms across groups. We explore theoretical properties of PAM and show its connection to known Bayesian nonparametric models. We propose an efficient slice sampler for posterior inference. Minor extensions of the proposed model for multivariate or count data are presented. Simulation studies and applications using real-world datasets illustrate the model's desirable performance.
翻译:摘要:我们考虑分组观测中的相依聚类问题。所提出的模型称为格子原子模型(PAM),可为每个分组估计一组聚类,并允许某些聚类与其他分组共享或由该分组独有。PAM基于著名的stick-breaking过程的扩展,通过将零作为聚类权重的可能取值引入模型,得到零增强贝塔(ZAB)分布。因此,ZAB允许某些聚类权重在多个分组中精确为零,从而实现跨分组的共享与独有原子。我们探究PAM的理论性质,并展示其与已知贝叶斯非参数模型的关联。提出一种高效的切片采样器进行后验推断。针对多变量或计数数据,给出该模型的次要扩展。仿真实验及真实数据集的应用案例验证了模型优良的性能。