Survival analysis is a statistical framework for modeling time-to-event data. It plays a pivotal role in medicine, reliability engineering, and social science research, where understanding event dynamics even with few data samples is critical. Recent advancements in machine learning, particularly those employing neural networks and decision trees, have introduced sophisticated algorithms for survival modeling. However, many of these methods rely on restrictive assumptions about the underlying event-time distribution, such as proportional hazard, time discretization, or accelerated failure time. In this study, we propose FPBoost, a survival model that combines a weighted sum of fully parametric hazard functions with gradient boosting. Distribution parameters are estimated with decision trees trained by maximizing the full survival likelihood. We show how FPBoost is a universal approximator of hazard functions, offering full event-time modeling flexibility while maintaining interpretability through the use of well-established parametric distributions. We evaluate concordance and calibration of FPBoost across multiple benchmark datasets, showcasing its robustness and versatility as a new tool for survival estimation.
翻译:生存分析是一种用于建模时间至事件数据的统计框架,在医学、可靠性工程和社会科学研究中发挥着关键作用,这些领域即使在数据样本较少的情况下理解事件动态也至关重要。机器学习的最新进展,特别是那些采用神经网络和决策树的方法,为生存建模引入了复杂的算法。然而,许多这些方法依赖于对潜在事件时间分布的严格假设,例如比例风险、时间离散化或加速失效时间。在本研究中,我们提出了FPBoost,这是一种将完全参数化风险函数的加权和与梯度提升相结合的生存模型。分布参数通过最大化完整生存似然训练的决策树进行估计。我们展示了FPBoost如何成为风险函数的通用逼近器,在通过使用成熟参数化分布保持可解释性的同时,提供完整的事件时间建模灵活性。我们在多个基准数据集上评估了FPBoost的一致性和校准性,展示了其作为生存估计新工具的鲁棒性和多功能性。