Fast Bayesian High-Dimensional Gaussian Graphical Model Estimation

Graphical models describe associations between variables through the notion of conditional independence. Gaussian graphical models are a widely used class of such models where the relationships are formalized by non-null entries of the precision matrix. However, in high dimensional cases, standard covariance estimates are typically unstable. Moreover, it is natural to expect only a few significant associations to be present in many realistic applications. This necessitates the injection of sparsity techniques into the estimation. Classical frequentist methods use penalization for this purpose; in contrast, fully Bayesian methods are computationally slow, typically requiring iterative sampling over a quadratic number of parameters in a space constrained by positive definiteness. We propose a Bayesian graph estimation method based on an ensemble of Bayesian neighborhood regressions. An attractive feature of our methods is the ability for easy parallelization across separate graphical neighborhoods, invoking computational efficiency greater than most existing methods. Our strategy induces sparsity with a Horseshoe shrinkage prior and includes a novel variable selection step based on the marginal likelihood from the predictors ranks. Our method appropriately combines the estimated regression coefficients to produce a graph estimate and a matrix of partial correlation estimates for inference. Performance of various methods are assessed using measures like FDR and TPR. Competitive performance across a variety of cases is demonstrated through extensive simulations. Lastly, we apply these methods to investigate the dependence structure across genetic expressions for women with triple negative breast cancer.

翻译：图模型通过条件独立性的概念描述变量之间的关联。高斯图模型是一类广泛应用的此类模型，其中关系通过精度矩阵的非零项形式化。然而，在高维情况下，标准协方差估计通常不稳定。此外，在众多现实应用场景中，仅存在少量显著关联是自然预期。这要求将稀疏性技术引入估计过程。经典的频率学派方法为此采用惩罚项；相比之下，全贝叶斯方法计算速度缓慢，通常需在受正定性约束的空间中对二次数量的参数进行迭代采样。我们提出一种基于贝叶斯邻域回归集成的方法进行图估计。该方法的一个吸引力在于易于跨独立邻域并行化，从而获得优于大多数现有方法的计算效率。我们采用Horseshoe收缩先验诱导稀疏性，并包含一个基于预测变量秩的边际似然的新型变量选择步骤。该方法通过适当组合估计的回归系数，生成图的估计以及用于推断的偏相关系数矩阵。使用FDR和TPR等指标评估不同方法的性能。通过大量模拟实验展示了各方法在多种情形下的竞争力。最后，我们应用这些方法研究三阴性乳腺癌女性患者基因表达之间的依赖结构。