We propose the Plaid Atoms Model (PAM), a novel Bayesian nonparametric model for grouped data. Founded on an idea of `atom skipping', PAM is part of a well-established category of models that generate dependent random distributions and clusters across multiple groups. Atom skipping referrs to stochastically assigning 0 weights to atoms in an infinite mixture. Deploying atom skipping across groups, PAM produces a dependent clustering pattern with overlapping and non-overlapping clusters across groups. As a result, interpretable posterior inference is possible such as reporting the posterior probability of a cluster being exclusive to a single group or shared among a subset of groups. We discuss the theoretical properties of the proposed and related models. Minor extensions of the proposed model for multivariate or count data are presented. Simulation studies and applications using real-world datasets illustrate the performance of the new models with comparison to existing models.
翻译:我们提出格子原子模型(PAM),一种用于分组数据的全新贝叶斯非参数模型。该模型以"原子跳过"思想为基础,属于生成多组间相依随机分布与聚类的成熟模型范畴。原子跳过指在无穷混合模型中随机赋予原子零权重。通过在组间部署原子跳过策略,PAM可生成组间存在重叠与非重叠聚类结构的相依聚类模式。这种设计使得后验推断具有可解释性,例如可报告某聚类专属单一组别或被组内子集共享的后验概率。本文讨论了所提模型及相关模型的理论性质,并介绍了针对多元或计数数据的模型微扩展。通过仿真研究及真实数据集应用,展示了新模型相较于现有模型的性能优势。