Interrogating the evolution of biological changes at early stages of life requires longitudinal profiling of molecules, such as DNA methylation, which can be challenging with children. We introduce a probabilistic and longitudinal machine learning framework based on multi-mean Gaussian processes (GPs), accounting for individual and gene correlations across time. This method provides future predictions of DNA methylation status at different individual ages while accounting for uncertainty. Our model is trained on a birth cohort of children with methylation profiled at ages 0-4, and we demonstrated that the status of methylation sites for each child can be accurately predicted at ages 5-7. We show that methylation profiles predicted by multi-mean GPs can be used to estimate other phenotypes, such as epigenetic age, and enable comparison to other health measures of interest. This approach encourages epigenetic studies to move towards longitudinal design for investigating epigenetic changes during development, ageing and disease progression.
翻译:在生命早期阶段探究生物学变化的演化过程需要对分子(如DNA甲基化)进行纵向谱分析,这在儿童研究中常面临挑战。我们提出了一种基于多均值高斯过程的概率性纵向机器学习框架,该框架可解释个体间及基因间随时间变化的关联性。该方法能在考虑不确定性的前提下,对不同年龄个体的DNA甲基化状态进行未来预测。模型基于0-4岁甲基化谱分析数据的儿童出生队列进行训练,并证实可准确预测每个儿童在5-7岁时的甲基化位点状态。研究表明,通过多均值高斯过程预测的甲基化谱可用于估计其他表型(如表观遗传年龄),并支持与其它健康相关指标进行对比分析。该研究策略将推动表观遗传学研究采用纵向设计,以探索发育、衰老及疾病进展过程中表观遗传变化。