In this paper, we introduce a novel Distributed Markov Chain Monte Carlo (MCMC) inference method for the Bayesian Non-Parametric Latent Block Model (DisNPLBM), employing the Master/Worker architecture. Our non-parametric co-clustering algorithm divides observations and features into partitions using latent multivariate Gaussian block distributions. The workload on rows is evenly distributed among workers, who exclusively communicate with the master and not among themselves. DisNPLBM demonstrates its impact on cluster labeling accuracy and execution times through experimental results. Moreover, we present a real-use case applying our approach to co-cluster gene expression data. The code source is publicly available at https://github.com/redakhoufache/Distributed-NPLBM.
翻译:本文提出了一种基于主从架构的分布式马尔可夫链蒙特卡洛(MCMC)推断方法,用于贝叶斯非参数潜在块模型(DisNPLBM)。我们的非参数共聚类算法利用潜在多变量高斯块分布将观测值和特征划分为多个分区。行上的工作负载均匀分布在各个工作节点之间,这些工作节点仅与主节点通信,而彼此之间不进行通信。实验结果表明,DisNPLBM在聚类标签准确性和执行时间方面具有显著影响。此外,我们展示了一个实际应用案例,将我们的方法用于基因表达数据的共聚类。源代码已公开于https://github.com/redakhoufache/Distributed-NPLBM。