Multicalibration for Modeling Censored Survival Data with Universal Adaptability

Traditional statistical and machine learning methods assume identical distribution for the training and test data sets. This assumption, however, is often violated in real applications, particularly in health care research, where the training data~(source) may underrepresent specific subpopulations in the testing or target domain. Such disparities, coupled with censored observations, present significant challenges for investigators aiming to make predictions for those minority groups. This paper focuses on target-independent learning under covariate shift, where we study multicalibration for survival probability and restricted mean survival time, and propose a black-box post-processing boosting algorithm designed for censored survival data. Our algorithm, leveraging the pseudo observations, yields a multicalibrated predictor competitive with propensity scoring regarding predictions on the unlabeled target domain, not just overall but across diverse subpopulations. Our theoretical analysis for pseudo observations relies on functional delta method and $p$-variational norm. We further investigate the algorithm's sample complexity and convergence properties, as well as the multicalibration guarantee for post-processed predictors. Our theoretical insights reveal the link between multicalibration and universal adaptability, suggesting that our calibrated function performs comparably to, if not better than, the inverse propensity score weighting estimator. The performance of our proposed methods is corroborated through extensive numerical simulations and a real-world case study focusing on prediction of cardiovascular disease risk in two large prospective cohort studies. These empirical results confirm its potential as a powerful tool for predictive analysis with censored outcomes in diverse and shifting populations.

翻译：传统的统计与机器学习方法通常假设训练数据与测试数据服从相同分布。然而，这一假设在实际应用中常不成立，尤其在健康医疗研究中，训练数据（源域）可能对测试或目标域中的特定亚群代表性不足。这种分布差异，加上删失观测的存在，给研究者针对少数群体进行预测带来了重大挑战。本文关注协变量偏移下的目标无关学习，研究了生存概率与限制平均生存时间的多标定问题，并提出了一种专为删失生存数据设计的黑盒后处理提升算法。我们的算法利用伪观测值，得到一个在多标定意义下具有竞争力的预测器，其在未标记目标域上的预测性能——无论是整体还是跨不同亚群——均可与倾向得分方法相媲美。我们对伪观测值的理论分析基于函数德尔塔方法和$p$变差范数。我们进一步研究了算法的样本复杂度与收敛性质，以及后处理预测器的多标定保证。我们的理论分析揭示了多标定与普适适应性之间的联系，表明经过标定的函数其表现至少不逊于逆倾向得分加权估计量。通过大量数值模拟和一项真实世界案例研究——该研究聚焦于两个大型前瞻性队列中心血管疾病风险的预测——我们验证了所提出方法的性能。这些实证结果证实了该方法可作为在不同且分布变化的人群中进行删失结局预测分析的有力工具。