Discrete Bayesian networks (DBNs) provide a broadly useful framework for modeling dependence structures in multivariate categorical data. There is a vast literature on methods for inferring conditional probabilities and graphical structure in DBNs, but data sparsity and parametric assumptions are major practical issues. In this article, we detail a comprehensive Bayesian framework for learning DBNs. First, we propose a hierarchical prior for the conditional probabilities that enables complicated interactions between parent variables and stability in sparse regimes. We give a novel Markov chain Monte Carlo (MCMC) algorithm utilizing parallel Langevin proposals to generate exact posterior samples, avoiding the pitfalls of variational approximations. Moreover, we verify that the full conditional distribution of the concentration parameters is log-concave under mild conditions, facilitating efficient sampling. We then propose two methods for learning network structures, including parent sets, Markov blankets, and DAGs, from categorical data. The first cycles through individual edges each MCMC iteration, whereas the second updates the entire structure as a single step. We evaluate the accuracy, power, and MCMC performance of our methods on several simulation studies. Finally, we apply our methodology to uncover prognostic network structure from primary breast cancer samples.
翻译:离散贝叶斯网络为多元分类数据中的依赖结构建模提供了广泛适用的框架。现有大量文献涉及推断离散贝叶斯网络中条件概率与图结构的方法,但数据稀疏性与参数假设仍是实际应用中的主要挑战。本文详细阐述了一个用于学习离散贝叶斯网络的综合贝叶斯框架。首先,我们提出了一种针对条件概率的分层先验,该先验能够实现父变量间的复杂交互作用,并在稀疏条件下保持稳定性。我们设计了一种新颖的马尔可夫链蒙特卡洛算法,利用并行朗之万提议生成精确的后验样本,从而避免了变分近似的缺陷。此外,我们验证了在温和条件下,浓度参数的全条件分布具有对数凹性,这有助于高效采样。随后,我们提出了两种从分类数据中学习网络结构(包括父节点集、马尔可夫毯和有向无环图)的方法。第一种方法在每个MCMC迭代中循环处理单条边,而第二种方法则将整个结构更新作为单一步骤。我们通过多项模拟研究评估了所提方法的准确性、统计功效及MCMC性能。最后,我们将该方法应用于原发性乳腺癌样本,以揭示其预后网络结构。