In a traditional Gaussian graphical model, data homogeneity is routinely assumed with no extra variables affecting the conditional independence. In modern genomic datasets, there is an abundance of auxiliary information, which often gets under-utilized in determining the joint dependency structure. In this article, we consider a Bayesian approach to model undirected graphs underlying heterogeneous multivariate observations with additional assistance from covariates. Building on product partition models, we propose a novel covariate-dependent Gaussian graphical model that allows graphs to vary with covariates so that observations whose covariates are similar share a similar undirected graph. To efficiently embed Gaussian graphical models into our proposed framework, we explore both Gaussian likelihood and pseudo-likelihood functions. For Gaussian likelihood, a G-Wishart distribution is used as a natural conjugate prior, and for the pseudo-likelihood, a product of Gaussian-conditionals is used. Moreover, the proposed model has large prior support and is flexible to approximate any $\nu$-H\"{o}lder conditional variance-covariance matrices with $\nu\in(0,1]$. We further show that based on the theory of fractional likelihood, the rate of posterior contraction is minimax optimal assuming the true density to be a Gaussian mixture with a known number of components. The efficacy of the approach is demonstrated via simulation studies and an analysis of a protein network for a breast cancer dataset assisted by mRNA gene expression as covariates.
翻译:在传统高斯图模型中,通常假设数据具有同质性,且无额外变量影响条件独立性。现代基因组数据集中存在丰富的辅助信息,这些信息在确定联合依赖结构时往往未被充分利用。本文考虑一种贝叶斯方法,借助协变量辅助信息对异质多元观测数据下的无向图进行建模。基于乘积分割模型,我们提出了一种新型的协变量相关高斯图模型,允许图随协变量变化,使得协变量相似的观测数据共享相似的无向图。为了将高斯图模型有效嵌入所提框架,我们同时探索了高斯似然函数和伪似然函数。对于高斯似然,采用G-Wishart分布作为自然共轭先验;对于伪似然,则使用高斯条件分布的乘积形式。此外,所提模型具有较大的先验支撑,能够灵活逼近任意ν∈(0,1]阶的ν-赫尔德条件协方差矩阵。基于分数似然理论,我们进一步证明在真实密度为已知分量数的高斯混合分布的假设下,后验收缩率达到了极小极大最优。通过模拟研究以及以mRNA基因表达为协变量的乳腺癌数据集中的蛋白质网络分析,验证了该方法的有效性。