Recently developed survival analysis methods improve upon existing approaches by predicting the probability of event occurrence in each of a number pre-specified (discrete) time intervals. By avoiding placing strong parametric assumptions on the event density, this approach tends to improve prediction performance, particularly when data are plentiful. However, in clinical settings with limited available data, it is often preferable to judiciously partition the event time space into a limited number of intervals well suited to the prediction task at hand. In this work, we develop a method to learn from data a set of cut points defining such a partition. We show that in two simulated datasets, we are able to recover intervals that match the underlying generative model. We then demonstrate improved prediction performance on three real-world observational datasets, including a large, newly harmonized stroke risk prediction dataset. Finally, we argue that our approach facilitates clinical decision-making by suggesting time intervals that are most appropriate for each task, in the sense that they facilitate more accurate risk prediction.
翻译:近期发展的生存分析方法通过预测事件在多个预设(离散)时间区间内发生的概率,改进了现有方法。通过避免对事件密度施加强参数假设,该方法往往能提升预测性能,尤其是在数据量充足时。然而,在可用数据有限的临床场景中,更优策略是审慎地将事件时间空间划分为数量有限且高度适配当前预测任务的区间。本研究提出一种从数据中学习划分区间切割点的方法。我们证明在两个仿真数据集上,该方法能够恢复与底层生成模型匹配的区间。进一步在三个真实世界观测数据集(包括一个大型新整合的卒中风险预测数据集)上验证了其预测性能的提升。最后,我们论证该方法通过为各任务推荐最适时间区间(即能实现更精准风险预测的区间),有助于促进临床决策制定。