Survival analysis is a widely-used technique for analyzing time-to-event data in the presence of censoring. In recent years, numerous survival analysis methods have emerged which scale to large datasets and relax traditional assumptions such as proportional hazards. These models, while being performant, are very sensitive to model hyperparameters including: (1) number of bins and bin size for discrete models and (2) number of cluster assignments for mixture-based models. Each of these choices requires extensive tuning by practitioners to achieve optimal performance. In addition, we demonstrate in empirical studies that: (1) optimal bin size may drastically differ based on the metric of interest (e.g., concordance vs brier score), and (2) mixture models may suffer from mode collapse and numerical instability. We propose a survival analysis approach which eliminates the need to tune hyperparameters such as mixture assignments and bin sizes, reducing the burden on practitioners. We show that the proposed approach matches or outperforms baselines on several real-world datasets.
翻译:生存分析是一种在存在删失数据时分析时间至事件数据的广泛使用技术。近年来,涌现出大量可扩展至大数据集、并放宽比例风险等传统假设的生存分析方法。这些模型虽性能优异,但对模型超参数极为敏感,具体包括:(1) 离散模型的箱数和箱宽,以及 (2) 基于混合模型的簇分配数。这些选择需要实践者进行大量调优才能实现最优性能。此外,我们在实证研究中发现:(1) 最优箱宽可能因关注指标不同(如一致性指数与布里尔分数)而差异显著,且 (2) 混合模型可能存在模式坍缩和数值不稳定性问题。我们提出一种生存分析方法,无需调优混合分配和箱宽等超参数,从而减轻实践者负担。实验表明,所提方法在多个真实数据集上的性能可与基线方法媲美或更优。