A general framework for penalized mixed-effects multitask learning with applications on DNA methylation surrogate biomarkers creation

Recent evidence highlights the usefulness of DNA methylation (DNAm) biomarkers as surrogates for exposure to risk factors for non-communicable diseases in epidemiological studies and randomized trials. DNAm variability has been demonstrated to be tightly related to lifestyle behavior and exposure to environmental risk factors, ultimately providing an unbiased proxy of an individual state of health. At present, the creation of DNAm surrogates relies on univariate penalized regression models, with elastic-net regularizer being the gold standard when accomplishing the task. Nonetheless, more advanced modeling procedures are required in the presence of multivariate outcomes with a structured dependence pattern among the study samples. In this work we propose a general framework for mixed-effects multitask learning in presence of high-dimensional predictors to develop a multivariate DNAm biomarker from a multi-center study. A penalized estimation scheme based on an expectation-maximization algorithm is devised, in which any penalty criteria for fixed-effects models can be conveniently incorporated in the fitting process. We apply the proposed methodology to create novel DNAm surrogate biomarkers for multiple correlated risk factors for cardiovascular diseases and comorbidities. We show that the proposed approach, modeling multiple outcomes together, outperforms state-of-the-art alternatives, both in predictive power and bio-molecular interpretation of the results.

翻译：最新证据表明，DNA甲基化（DNAm）生物标志物在流行病学研究和随机对照试验中可作为非传染性疾病风险因素暴露的有效替代指标。研究证实DNA甲基化变异与生活方式行为及环境风险因素暴露密切相关，最终能提供个体健康状态的无偏代理指标。当前DNA甲基化替代物的生成主要依赖单变量惩罚回归模型，其中弹性网络正则化方法已成为完成该任务的标准方案。然而，当存在具有结构化依赖模式的多变量结局变量及研究样本时，需要更先进的建模流程。本研究提出一种面向高维预测变量的混合效应多任务学习通用框架，用于从多中心研究中开发多变量DNA甲基化生物标志物。我们设计了基于期望最大化算法的惩罚估计方案，该方案可灵活地将任意固定效应模型惩罚准则纳入拟合过程。通过将该方法应用于心血管疾病及其合并症的多重相关风险因素，我们成功创建了新型DNA甲基化替代生物标志物。研究结果表明，这种联合建模多结局变量的方法在预测性能与结果的生物分子解释方面均优于现有先进方案。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日