Bayesian graphical models are powerful tools to infer complex relationships in high dimension, yet are often fraught with computational and statistical challenges. If exploited in a principled way, the increasing information collected alongside the data of primary interest constitutes an opportunity to mitigate these difficulties by guiding the detection of dependence structures. For instance, gene network inference may be informed by the use of publicly available summary statistics on the regulation of genes by genetic variants. Here we present a novel Gaussian graphical modelling framework to identify and leverage information on the centrality of nodes in conditional independence graphs. Specifically, we consider a fully joint hierarchical model to simultaneously infer (i) sparse precision matrices and (ii) the relevance of node-level information for uncovering the sought-after network structure. We encode such information as candidate auxiliary variables using a spike-and-slab submodel on the propensity of nodes to be hubs, which allows hypothesis-free selection and interpretation of a sparse subset of relevant variables. As efficient exploration of large posterior spaces is needed for real-world applications, we develop a variational expectation conditional maximisation algorithm that scales inference to hundreds of samples, nodes and auxiliary variables. We illustrate and exploit the advantages of our approach in simulations and in a gene network study which identifies hub genes involved in biological pathways relevant to immune-mediated diseases.
翻译:贝叶斯图模型是推断高维复杂关系的强大工具,但常面临计算与统计方面的挑战。若能以原则性方式利用伴随主数据收集的日益增多的补充信息,则有望通过引导依赖结构的检测来缓解这些困难。例如,基因网络推断可通过利用公开可用的关于基因受遗传变异调控的汇总统计数据提供指导。本文提出一种新颖的高斯图建模框架,用于识别并利用条件独立图中节点中心性的信息。具体而言,我们构建了一个完全联合分层模型,以同时推断(i)稀疏精度矩阵和(ii)节点级信息对揭示目标网络结构的相关性。通过采用spike-and-slab子模型编码节点作为枢纽倾向性的候选辅助变量,我们实现了相关变量稀疏子集的无假设选择与解释。鉴于实际应用需要高效探索大规模后验空间,我们开发了一种变分期望条件最大化算法,可将推断扩展至数百个样本、节点和辅助变量。通过模拟实验和一项识别免疫介导疾病相关生物通路中枢纽基因的基因网络研究,我们展示并利用了该方法的多重优势。