Gaussian graphical models typically assume a homogeneous structure across all subjects, which is often restrictive in applications. In this article, we propose a weighted pseudo-likelihood approach for graphical modeling which allows different subjects to have different graphical structures depending on extraneous covariates. The pseudo-likelihood approach replaces the joint distribution by a product of the conditional distributions of each variable. We cast the conditional distribution as a heteroscedastic regression problem, with covariate-dependent variance terms, to enable information borrowing directly from the data instead of a hierarchical framework. This allows independent graphical modeling for each subject, while retaining the benefits of a hierarchical Bayes model and being computationally tractable. An efficient embarrassingly parallel variational algorithm is developed to approximate the posterior and obtain estimates of the graphs. Using a fractional variational framework, we derive asymptotic risk bounds for the estimate in terms of a novel variant of the $\alpha$-R\'{e}nyi divergence. We theoretically demonstrate the advantages of information borrowing across covariates over independent modeling. We show the practical advantages of the approach through simulation studies and illustrate the dependence structure in protein expression levels on breast cancer patients using CNV information as covariates.
翻译:高斯图模型通常假设所有受试者具有同质结构,这在应用中往往受到限制。本文提出一种用于图建模的加权伪似然方法,允许不同受试者根据外生协变量具有不同的图结构。伪似然方法通过各变量条件分布的乘积替代联合分布。我们将条件分布建模为异方差回归问题(含协变量相关的方差项),直接从数据而非分层框架中实现信息借取。这使得每个受试者能够独立进行图建模,同时保留分层贝叶斯模型的优势并保持计算可行性。我们开发了一种高效的并行变分算法,以近似后验分布并获取图估计。利用分数变分框架,我们基于$\alpha$-R\'{e}nyi散度的新型变体推导出估计量的渐近风险界。理论上证明了跨协变量信息借取相较于独立建模的优势。通过模拟研究展示了该方法在实际中的优势,并以CNV信息作为协变量,阐释了乳腺癌患者蛋白质表达水平的依赖结构。