Survival analysis is widely used as a technique to model time-to-event data when some data is censored, particularly in healthcare for predicting future patient risk. In such settings, survival models must be both accurate and interpretable so that users (such as doctors) can trust the model and understand model predictions. While most literature focuses on discrimination, interpretability is equally as important. A successful interpretable model should be able to describe how changing each feature impacts the outcome, and should only use a small number of features. In this paper, we present DyS (pronounced ``dice''), a new survival analysis model that achieves both strong discrimination and interpretability. DyS is a feature-sparse Generalized Additive Model, combining feature selection and interpretable prediction into one model. While DyS works well for all survival analysis problems, it is particularly useful for large (in $n$ and $p$) survival datasets such as those commonly found in observational healthcare studies. Empirical studies show that DyS competes with other state-of-the-art machine learning models for survival analysis, while being highly interpretable.
翻译:生存分析作为一种对存在删失的时间-事件数据进行建模的技术被广泛使用,尤其在医疗领域用于预测患者未来风险。在此类场景中,生存模型必须兼具准确性和可解释性,以便用户(如医生)能够信任模型并理解其预测结果。尽管大多数文献关注于判别能力,可解释性同样重要。一个成功的可解释模型应能描述每个特征的变化如何影响结果,且仅使用少量特征。本文提出了DyS(发音为"dice"),一种新型生存分析模型,兼具强判别能力与可解释性。DyS是一种特征稀疏的广义加性模型,将特征选择与可解释预测融合于单一模型中。虽然DyS适用于所有生存分析问题,但尤其适用于大型(就$n$和$p$而言)生存数据集,例如观察性医疗研究中常见的数据集。实证研究表明,DyS在与现有最先进的生存分析机器学习模型的竞争中表现出色,同时具有高度可解释性。