Censoring is the central problem in survival analysis where either the time-to-event (for instance, death), or the time-tocensoring (such as loss of follow-up) is observed for each sample. The majority of existing machine learning-based survival analysis methods assume that survival is conditionally independent of censoring given a set of covariates; an assumption that cannot be verified since only marginal distributions is available from the data. The existence of dependent censoring, along with the inherent bias in current estimators has been demonstrated in a variety of applications, accentuating the need for a more nuanced approach. However, existing methods that adjust for dependent censoring require practitioners to specify the ground truth copula. This requirement poses a significant challenge for practical applications, as model misspecification can lead to substantial bias. In this work, we propose a flexible deep learning-based survival analysis method that simultaneously accommodate for dependent censoring and eliminates the requirement for specifying the ground truth copula. We theoretically prove the identifiability of our model under a broad family of copulas and survival distributions. Experiments results from a wide range of datasets demonstrate that our approach successfully discerns the underlying dependency structure and significantly reduces survival estimation bias when compared to existing methods.
翻译:删失是生存分析的核心问题,即每个样本要么观测到事件发生时间(如死亡),要么观测到删失时间(如失访)。现有基于机器学习的生存分析方法大多假设在给定协变量条件下,生存时间与删失时间条件独立——然而该假设无法验证,因为数据仅提供边际分布。诸多应用已证实相依删失的存在及其导致的当前估计器固有偏差,凸显了更精细方法的必要性。但现有调整相依删失的方法要求研究者指定真实Copula函数,这一要求在实践应用中构成重大挑战,因为模型错误设定可能引发显著偏差。本文提出一种灵活的深度学习方法,既能同时处理相依删失问题,又无需指定真实Copula函数。我们从理论上证明了该模型在广泛Copula族和生存分布条件下的可识别性。跨数据集实验结果表明,相比现有方法,本方法能成功识别潜在依赖结构,并显著降低生存估计偏差。