We consider the problem of clustering grouped data with possibly non-exchangeable groups whose dependencies can be characterized by a known directed acyclic graph. To allow the sharing of clusters among the non-exchangeable groups, we propose a Bayesian nonparametric approach, termed graphical Dirichlet process, that jointly models the dependent group-specific random measures by assuming each random measure to be distributed as a Dirichlet process whose concentration parameter and base probability measure depend on those of its parent groups. The resulting joint stochastic process respects the Markov property of the directed acyclic graph that links the groups. We characterize the graphical Dirichlet process using a novel hypergraph representation as well as the stick-breaking representation, the restaurant-type representation, and the representation as a limit of a finite mixture model. We develop an efficient posterior inference algorithm and illustrate our model with simulations and a real grouped single-cell dataset.
翻译:我们考虑对分组数据进行聚类的问题,其中各组可能具有非可交换性,且组间依赖关系可通过已知有向无环图刻画。为允许非可交换组间共享聚类结构,提出一种贝叶斯非参数方法——图形狄利克雷过程,该方法通过联合建模各组特定随机测度实现:假设每个随机测度服从狄利克雷过程,其浓度参数和基概率测度依赖于其父组的对应参数。所构建的联合随机过程满足连接各组的马尔可夫性质。我们通过新颖的超图表示、权杖断裂表示、餐馆类型表示以及有限混合模型极限表示来刻画图形狄利克雷过程。进一步开发了高效的后验推断算法,并通过仿真实验和真实分组单细胞数据集验证模型有效性。