We consider the problem of clustering grouped data with possibly non-exchangeable groups whose dependencies can be characterized by a directed acyclic graph. To allow the sharing of clusters among the non-exchangeable groups, we propose a Bayesian nonparametric approach, termed graphical Dirichlet process, that jointly models the dependent group-specific random measures by assuming each random measure to be distributed as a Dirichlet process whose concentration parameter and based probability measure depend on those of its parent groups. The resulting joint stochastic process respects the Markov property of the directed acyclic graph that links the groups. We characterize the graphical Dirichlet process using a novel hypergraph representation as well as the stick-breaking representation, the restaurant-type representation, and the representation as a limit of a finite mixture model. We develop an efficient posterior inference algorithm and illustrate our model with simulations and a real grouped single-cell data.
翻译:我们考虑对可能具有非可交换性的分组数据进行聚类的问题,这些分组的依赖关系可由有向无环图刻画。为允许在非可交换的分组间共享聚类簇,我们提出一种贝叶斯非参数方法,称为图化狄利克雷过程。该方法通过假设每个随机测度服从狄利克雷过程,且其浓度参数与基概率测度依赖于父分组的对应参数,从而联合建模分组特定的依赖随机测度。由此得到的联合随机过程尊重连接各分组的有向无环图的马尔可夫性质。我们通过新颖的超图表示、棍棒断裂表示、餐馆类型表示以及有限混合模型极限表示来刻画图化狄利克雷过程。我们开发了高效的后验推断算法,并通过模拟实验和真实分组单细胞数据验证了所提模型。