Time-to-event analysis, also known as survival analysis, aims to predict the time of occurrence of an event, given a set of features. One of the major challenges in this area is dealing with censored data, which can make learning algorithms more complex. Traditional methods such as Cox's proportional hazards model and the accelerated failure time (AFT) model have been popular in this field, but they often require assumptions such as proportional hazards and linearity. In particular, the AFT models often require pre-specified parametric distributional assumptions. To improve predictive performance and alleviate strict assumptions, there have been many deep learning approaches for hazard-based models in recent years. However, representation learning for AFT has not been widely explored in the neural network literature, despite its simplicity and interpretability in comparison to hazard-focused methods. In this work, we introduce the Deep AFT Rank-regression model for Time-to-event prediction (DART). This model uses an objective function based on Gehan's rank statistic, which is efficient and reliable for representation learning. On top of eliminating the requirement to establish a baseline event time distribution, DART retains the advantages of directly predicting event time in standard AFT models. The proposed method is a semiparametric approach to AFT modeling that does not impose any distributional assumptions on the survival time distribution. This also eliminates the need for additional hyperparameters or complex model architectures, unlike existing neural network-based AFT models. Through quantitative analysis on various benchmark datasets, we have shown that DART has significant potential for modeling high-throughput censored time-to-event data.
翻译:时间-事件分析,亦称生存分析,旨在根据一组特征预测事件发生的时间。该领域的主要挑战之一在于处理删失数据,这可能导致学习算法更为复杂。传统方法如Cox比例风险模型和加速失效时间模型在该领域虽广受欢迎,但常需假设比例风险及线性关系成立。特别是AFT模型往往要求预先指定参数分布假设。为提升预测性能并放宽严格假设,近年来出现了许多基于深度学习的风险模型方法。然而,尽管相较于风险导向方法,AFT模型具有简洁性和可解释性,其在神经网络文献中的表示学习仍未得到广泛探索。本文提出面向时间-事件预测的深度加速失效时间秩回归模型(DART)。该模型采用基于Gehan秩统计量的目标函数,该统计量在表示学习中高效且可靠。DART在消除建立基线事件时间分布需求的同时,保留了标准AFT模型直接预测事件时间的优势。所提方法是一种半参数的AFT建模方法,不对生存时间分布施加任何分布假设,这也避免了现有基于神经网络的AFT模型所需的额外超参数或复杂模型架构。通过在多个基准数据集上的定量分析,我们证明了DART在高通量删失时间-事件数据建模中具有显著潜力。