In many applications, it is important to identify subpopulations that survive longer or shorter than the rest of the population. In medicine, for example, it allows determining which patients benefit from treatment, and in predictive maintenance, which components are more likely to fail. Existing methods for discovering subgroups with exceptional survival characteristics require restrictive assumptions about the survival model (e.g. proportional hazards), pre-discretized features, and, as they compare average statistics, tend to overlook individual deviations. In this paper, we propose Sysurv, a fully differentiable, non-parametric method that leverages random survival forests to learn individual survival curves, automatically learns conditions and how to combine these into inherently interpretable rules, so as to select subgroups with exceptional survival characteristics. Empirical evaluation on a wide range of datasets and settings, including a case study on cancer data, shows that Sysurv reveals insightful and actionable survival subgroups.
翻译:在许多应用中,识别生存期长于或短于总体群体的子群体至关重要。例如,在医学领域,这有助于确定哪些患者能从治疗中获益;在预测性维护中,则可识别哪些组件更可能发生故障。现有用于发现具有异常生存特征的子群的方法通常需要对生存模型(如比例风险)做出限制性假设、要求特征预离散化,并且由于它们比较的是平均统计量,往往忽略个体偏差。本文提出Sysurv,一种完全可微、非参数化的方法,该方法利用随机生存森林学习个体生存曲线,自动学习条件并将这些条件组合成具有内在可解释性的规则,从而选择具有异常生存特征的子群。在包括癌症数据案例研究在内的广泛数据集和设置上的实证评估表明,Sysurv能够揭示具有深刻见解和可操作性的生存子群。