Multicalibration for Censored Survival Data: Towards Universal Adaptability in Predictive Modeling

Traditional statistical and machine learning methods assume identical distribution for the training and test data sets. This assumption, however, is often violated in real applications, particularly in health care research, where the training data~(source) may underrepresent specific subpopulations in the testing or target domain. Such disparities, coupled with censored observations, present significant challenges for investigators aiming to make predictions for those minority groups. This paper focuses on target-independent learning under covariate shift, where we study multicalibration for survival probability and restricted mean survival time, and propose a black-box post-processing boosting algorithm designed for censored survival data. Our algorithm, leveraging the pseudo observations, yields a multicalibrated predictor competitive with propensity scoring regarding predictions on the unlabeled target domain, not just overall but across diverse subpopulations. Our theoretical analysis for pseudo observations relies on functional delta method and $p$-variational norm. We further investigate the algorithm's sample complexity and convergence properties, as well as the multicalibration guarantee for post-processed predictors. Our theoretical insights reveal the link between multicalibration and universal adaptability, suggesting that our calibrated function performs comparably to, if not better than, the inverse propensity score weighting estimator. The performance of our proposed methods is corroborated through extensive numerical simulations and a real-world case study focusing on prediction of cardiovascular disease risk in two large prospective cohort studies. These empirical results confirm its potential as a powerful tool for predictive analysis with censored outcomes in diverse and shifting populations.

翻译：传统的统计与机器学习方法通常假设训练数据与测试数据服从相同分布。然而，这一假设在实际应用中常常被违背，尤其是在医疗健康研究中，训练数据（源域）可能对测试或目标域中的特定亚群代表性不足。这种分布差异，加之删失观测的存在，给研究者针对少数群体进行预测带来了重大挑战。本文聚焦于协变量偏移下的目标无关学习，研究了生存概率与限制平均生存时间的多校准问题，并提出了一种专为删失生存数据设计的黑盒后处理提升算法。我们的算法利用伪观测值，得到一个在多校准意义上的预测器，其在未标记目标域上的预测表现——无论是整体还是跨不同亚群——均可与倾向得分方法相竞争。我们对伪观测值的理论分析依赖于函数德尔塔方法和 $p$ 变差范数。我们进一步研究了该算法的样本复杂度与收敛性质，以及后处理预测器的多校准保证。我们的理论分析揭示了多校准与普适适应性之间的联系，表明经过校准的函数其表现至少不逊于逆倾向得分加权估计量。我们通过大量的数值模拟以及一项关注两个大型前瞻性队列研究中心血管疾病风险预测的真实世界案例研究，验证了所提出方法的性能。这些实证结果证实了其作为一种强大工具的潜力，可用于在多样且分布变化的人群中对删失结局进行预测分析。