High-throughput sequencing technologies have enabled the collection of large-scale longitudinal -omics data, providing new opportunities for studying co-expression networks among molecular nodes such as genes and proteins. However, the high dimensionality and temporal dependence inherent in such data require specialized statistical methods. We propose a novel approach to infer dynamic co-expression networks among features over time (DCENt), where each node (feature) is modeled with a mixed-effects model, and dependencies among nodes are captured through correlated random effects. We develop two innovative penalized algorithms which harness the state of the art of threshold covariance estimators to estimate the random-effects covariance structure. Simulation studies show improved performance over existing approaches in terms of both mean square error and mean absolute error. We further apply the methods to data from the CARDIA study to investigate how the protein co-expression networks evolve over time as well as the association between protein trajectory patterns.
翻译:高通量测序技术使得大规模纵向组学数据的采集成为可能,为研究基因、蛋白质等分子节点间的共表达网络提供了新机遇。然而,这类数据的高维性和时间依赖性需要专门的统计方法处理。我们提出了一种新方法,用于推断特征间的动态共表达网络随时间变化(DCENt),该方法将每个节点(特征)用混合效应模型建模,并通过相关随机效应捕捉节点间的依赖关系。我们开发了两种创新惩罚算法,利用阈值协方差估计器的最新成果来估计随机效应的协方差结构。模拟研究表明,在均方误差和平均绝对误差方面,该方法优于现有方法。我们进一步将该方法应用于CARDIA研究数据,以探究蛋白质共表达网络如何随时间演变,以及蛋白质轨迹模式之间的关联。