Bayesian graphical models are powerful tools to infer complex relationships in high dimension, yet are often fraught with computational and statistical challenges. If exploited in a principled way, the increasing information collected alongside the data of primary interest constitutes an opportunity to mitigate these difficulties by guiding the detection of dependence structures. For instance, gene network inference may be informed by the use of publicly available summary statistics on the regulation of genes by genetic variants. Here we present a novel Gaussian graphical modelling framework to identify and leverage information on the centrality of nodes in conditional independence graphs. Specifically, we consider a fully joint hierarchical model to simultaneously infer (i) sparse precision matrices and (ii) the relevance of node-level information for uncovering the sought-after network structure. We encode such information as candidate auxiliary variables using a spike-and-slab submodel on the propensity of nodes to be hubs, which allows hypothesis-free selection and interpretation of a sparse subset of relevant variables. As efficient exploration of large posterior spaces is needed for real-world applications, we develop a variational expectation conditional maximisation algorithm that scales inference to hundreds of samples, nodes and auxiliary variables. We illustrate and exploit the advantages of our approach in simulations and in a gene network study which identifies hub genes involved in biological pathways relevant to immune-mediated diseases.
翻译:贝叶斯图模型是推断高维复杂关系的强大工具,但常面临计算和统计上的挑战。若以原则性方式加以利用,随主要研究数据同步收集的日益丰富的辅助信息,可通过引导依赖结构的检测来缓解这些困难。例如,基因网络推断可借助公开可用的遗传变异调控基因的汇总统计数据进行指导。本文提出一种新的高斯图建模框架,用于识别并利用条件独立图中节点中心性的信息。具体而言,我们构建了一个完全联合的分层模型,以同时推断(i)稀疏精度矩阵和(ii)节点级信息在揭示目标网络结构中的相关性。我们通过使用尖峰-板状子模型对节点作为枢纽的倾向性进行编码,将这些信息作为候选辅助变量,从而实现对相关变量的稀疏子集进行无假设的选择与解释。针对实际应用中对大规模后验空间高效探索的需求,我们开发了一种变分期望条件最大化算法,可将推断扩展至数百个样本、节点及辅助变量。我们通过模拟实验和一项基因网络研究(该研究识别了与免疫介导疾病相关生物学通路中的枢纽基因)验证并展示了该方法的优势。