As in many fields of medical research, survival analysis has witnessed a growing interest in the application of deep learning techniques to model complex, high-dimensional, heterogeneous, incomplete, and censored medical data. Current methods often make assumptions about the relations between data that may not be valid in practice. In response, we introduce SAVAE (Survival Analysis Variational Autoencoder), a novel approach based on Variational Autoencoders. SAVAE contributes significantly to the field by introducing a tailored ELBO formulation for survival analysis, supporting various parametric distributions for covariates and survival time (as long as the log-likelihood is differentiable). It offers a general method that consistently performs well on various metrics, demonstrating robustness and stability through different experiments. Our proposal effectively estimates time-to-event, accounting for censoring, covariate interactions, and time-varying risk associations. We validate our model in diverse datasets, including genomic, clinical, and demographic data, with varying levels of censoring. This approach demonstrates competitive performance compared to state-of-the-art techniques, as assessed by the Concordance Index and the Integrated Brier Score. SAVAE also offers an interpretable model that parametrically models covariates and time. Moreover, its generative architecture facilitates further applications such as clustering, data imputation, and the generation of synthetic patient data through latent space inference from survival data.
翻译:如同医学研究领域的诸多方向,生存分析领域对运用深度学习技术建模复杂、高维、异质、不完整且带删失的医学数据日益关注。现有方法常对数据间关系做出在实际中未必成立的假设。为此,我们提出SAVAE(生存分析变分自编码器),一种基于变分自编码器的新方法。SAVAE通过引入专为生存分析定制的ELBO公式,为领域做出显著贡献,该公式支持协变量与生存时间的多种参数化分布(只要对数似然可微)。它提供了一种通用方法,能在多种指标上持续表现出色,并通过不同实验证明了其鲁棒性与稳定性。我们的方法能有效估计事件发生时间,同时处理删失、协变量交互作用以及时变风险关联。我们在包含基因组、临床和人口统计学数据的不同数据集上验证了模型,这些数据具有不同程度的删失。根据一致性指数(Concordance Index)和综合布赖尔评分(Integrated Brier Score)评估,该方法与当前最先进技术相比展现出竞争性性能。SAVAE还提供了一个可解释模型,能够对协变量和时间进行参数化建模。此外,其生成式架构便于开展进一步应用,如聚类、数据填补,以及通过从生存数据中推断潜在空间来生成合成患者数据。