This work focuses on clustering populations with a hierarchical dependency structure that can be described by a tree. A particular example that is the focus of our work is the phylogenetic tree, with nodes often representing biological species. Clustering of the populations in this problem is equivalent to identify branches in the tree where the populations at the parent and child node have significantly different distributions. We construct a nonparametric Bayesian model based on hierarchical Pitman-Yor and Poisson processes to exploit this hierarchical structure, with a key contribution being the ability to share statistical information between subpopulations. We develop an efficient particle MCMC algorithm to address computational challenges involved with posterior inference. We illustrate the efficacy of our proposed approach on both synthetic and real-world problems.
翻译:本研究聚焦于具有层级依赖结构(可用树形结构描述)的群体聚类问题。典型应用场景是系统发育树,其中节点通常代表生物物种。该问题中的群体聚类等价于识别树中父节点与子节点群体分布存在显著差异的分支。我们基于层级Pitman-Yor过程和泊松过程构建非参数贝叶斯模型以利用该层级结构,核心贡献在于实现了子群体间统计信息的共享。针对后验推断面临的计算挑战,我们开发了高效的粒子马尔可夫链蒙特卡洛算法。通过合成数据与真实数据实验,验证了所提方法的有效性。