Gaussian graphical models (GGMs) are widely used for recovering the conditional independence structure among random variables. Recently, several key advances have been made to exploit an additional set of variables for better estimating the GGMs of the variables of interest. For example, in co-expression quantitative trait locus (eQTL) studies, both the mean expression level of genes as well as their pairwise conditional independence structure may be adjusted by genetic variants local to those genes. Existing methods to estimate covariate-adjusted GGMs either allow only the mean to depend on covariates or suffer from poor scaling assumptions due to the inherent non-convexity of simultaneously estimating the mean and precision matrix. In this paper, we propose a convex formulation that jointly estimates the covariate-adjusted mean and precision matrix by utilizing the natural parametrization of the multivariate Gaussian likelihood. This convexity yields theoretically better performance as the sparsity and dimension of the covariates grow large relative to the number of samples. We verify our theoretical results with numerical simulations and perform a reanalysis of an eQTL study of glioblastoma multiforme (GBM), an aggressive form of brain cancer.
翻译:高斯图模型(GGMs)被广泛用于恢复随机变量间的条件独立结构。近期多项重要进展利用额外变量集以更好地估计目标变量的GGMs。例如,在共表达数量性状位点(eQTL)研究中,基因的平均表达水平及其成对条件独立结构均可受基因局部遗传变异的影响。现有估计协变量调整GGMs的方法要么仅允许均值依赖于协变量,要么因同时估计均值与精度矩阵固有的非凸性而受限于较弱的尺度假设。本文提出一种凸优化表述,通过利用多元高斯似然的自然参数化,联合估计协变量调整的均值与精度矩阵。该凸性在协变量稀疏度和维度相对于样本量增大时具有更优的理论性能。我们通过数值模拟验证理论结果,并对多形性胶质母细胞瘤(GBM)——一种侵袭性脑癌——的eQTL研究进行了重新分析。