Accounting for dependence among high-dimensional variables in omics data analysis is critical to obtain accurate and reliable statistical inference. Although latent, omics variables often exhibit structured correlation/co-expression patterns. However, there are few methods explicitly accounting for such structured dependence in the statistical analysis of omics data (e.g., differential expression analysis). To address this methodological gap, we propose a Co-expression network multivariate Regression (CoReg), which integrates co-expression network structure into multivariate regression analysis to precisely account for the inter-correlations (dependence) among omics variables. We show in simulations that CoReg substantially improves the accuracy of statistical inference and replicability across studies. These findings suggest that CoReg provides an alternative approach for omics data association analysis with dependence adjustment, analogous to the role of mixed-effects models in handling repeated measures in lower-dimensional settings.
翻译:在组学数据分析中,考虑高维变量间的依赖关系对于获得准确可靠的统计推断至关重要。尽管是潜在的,组学变量常表现出结构化的相关性/共表达模式。然而,在组学数据的统计分析(如差异表达分析)中,明确考虑此类结构化依赖的方法尚不多见。为填补这一方法学空白,我们提出了一种共表达网络多元回归方法(CoReg),该方法将共表达网络结构整合到多元回归分析中,以精确考虑组学变量间的相互关联(依赖性)。我们在模拟实验中表明,CoReg能显著提高统计推断的准确性及跨研究的可重复性。这些发现表明,CoReg为组学数据关联分析提供了一种考虑依赖调整的替代方法,其作用类似于混合效应模型在低维情境中处理重复测量的角色。