Tutorial on survival modeling with applications to omics data

Motivation: Identification of genomic, molecular and clinical markers prognostic of patient survival is important for developing personalized disease prevention, diagnostic and treatment approaches. Modern omics technologies have made it possible to investigate the prognostic impact of markers at multiple molecular levels, including genomics, epigenomics, transcriptomics, proteomics and metabolomics, and how these potential risk factors complement clinical characterization of patient outcomes for survival prognosis. However, the massive sizes of the omics data sets, along with their correlation structures, pose challenges for studying relationships between the molecular information and patients' survival outcomes. Results: We present a general workflow for survival analysis that is applicable to high-dimensional omics data as inputs when identifying survival-associated features and validating survival models. In particular, we focus on the commonly used Cox-type penalized regressions and hierarchical Bayesian models for feature selection in survival analysis, which are are especially useful for high-dimensional data, but the framework is applicable more generally. Availability and implementation: A step-by-step R tutorial using The Cancer Genome Atlas survival and omics data for the execution and evaluation of survival models has been made available at https://ocbe-uio.github.io/survomics/survomics.html.

翻译：动机：识别与患者生存预后相关的基因组、分子及临床标志物，对于制定个性化疾病预防、诊断和治疗策略具有重要意义。现代组学技术已能研究多分子层面（包括基因组学、表观基因组学、转录组学、蛋白质组学及代谢组学）标志物的预后影响，并探索这些潜在风险因素如何补充临床特征以预测患者生存结局。然而，组学数据集的庞大规模及其相关结构，为研究分子信息与患者生存结局之间的关系带来了挑战。结果：我们提出了一套适用于生存分析的通用工作流程，能够以高维组学数据为输入，识别与生存相关的特征并验证生存模型。具体而言，我们聚焦于生存分析中特征选择常用的Cox型惩罚回归与分层贝叶斯模型——这些方法尤其适用于高维数据，但该框架具有更广泛的适用性。可用性与实现：基于癌症基因组图谱(TCGA)的生存与组学数据，我们提供了分步式R语言教程，用于执行并评估生存模型，详见https://ocbe-uio.github.io/survomics/survomics.html。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日