The introduction of machine learning (ML) techniques to the field of survival analysis has increased the flexibility of modeling approaches, and ML based models have become state-of-the-art. These models optimize their own cost functions, and their performance is often evaluated using the concordance index (C-index). From a statistical learning perspective, it is therefore an important problem to analyze the relationship between the optimizers of the C-index and those of the ML cost functions. We address this issue by providing C-index Fisher-consistency results and excess risk bounds for several of the commonly used cost functions in survival analysis. We identify conditions under which they are consistent, under the form of three nested families of survival models. We also study the general case where no model assumption is made and present a new, off-the-shelf method that is shown to be consistent with the C-index, although computationally expensive at inference. Finally, we perform limited numerical experiments with simulated data to illustrate our theoretical findings.
翻译:机器学习技术在生存分析领域的引入提升了建模方法的灵活性,基于机器学习的模型已成为当前最先进的方法。这些模型通过优化自身代价函数进行训练,其性能通常采用一致性指数进行评估。从统计学习视角来看,分析一致性指数优化器与机器学习代价函数优化器之间的关系至关重要。本文通过为生存分析中若干常用代价函数提供一致性指数的Fisher相合性结果和超额风险界来探讨该问题。我们以嵌套生存模型族的三元组形式,识别了这些代价函数满足相合性的条件。同时研究无模型假设的一般情况,提出了一种新的即用型方法,该方法在推理阶段计算成本较高,但被证明与一致性指数相合。最后,我们利用模拟数据开展有限数值实验以验证理论结论。