In survival analysis, the estimation of the proportion of subjects who will never experience the event of interest, termed the cure rate, has received considerable attention recently. Its estimation can be a particularly difficult task when follow-up is not sufficient, that is when the censoring mechanism has a smaller support than the distribution of the target data. In the latter case, non-parametric estimators were recently proposed using extreme value methodology, assuming that the distribution of the susceptible population is in the Fr\'echet or Gumbel max-domains of attraction. In this paper, we take the extreme value techniques one step further, to jointly estimate the cure rate and the extreme value index, using probability plotting methodology, and in particular using the full information contained in the top order statistics. In other words, under sufficient or insufficient follow-up, we reconstruct the immune proportion. To this end, a Peaks-over-Threshold approach is proposed under the Gumbel max-domain assumption. Next, the approach is also transferred to more specific models such as Pareto, log-normal and Weibull tail models, allowing to recognize the most important tail characteristics of the susceptible population. We establish the asymptotic behavior of our estimators under regularization. Though simulation studies, our estimators are show to rival and often outperform established models, even when purely considering cure rate estimation. Finally, we provide an application of our method to Norwegian birth registry data.
翻译:在生存分析中,对永远不会经历目标事件的研究对象比例(称为治愈率)的估计近来受到广泛关注。当随访数据不充分时,即当删失机制的支持集小于目标数据分布的支持集时,治愈率的估计尤为困难。针对后一种情况,近期有研究基于极值理论方法提出了非参数估计量,其假设易感人群的分布属于Fr\'echet或Gumbel极大值吸引域。本文进一步拓展极值技术,通过概率图方法——特别是利用顶端次序统计量的完整信息——联合估计治愈率与极值指数。换言之,无论在随访充分或不充分的情况下,我们都能重构免疫群体比例。为此,我们在Gumbel极大值吸引域假设下提出了一种超阈值峰值方法。随后,该方法被推广至更具体的模型(如帕累托尾模型、对数正态尾模型和威布尔尾模型),从而能够识别易感人群最重要的尾部特征。我们在正则化条件下建立了估计量的渐近性质。通过模拟研究,我们的估计量被证明可与现有模型媲美,且在单纯考虑治愈率估计时往往表现更优。最后,我们将该方法应用于挪威出生登记数据。