Clinical risk prediction models are regularly updated as new data, often with additional covariates, become available. We propose CARE (Convex Aggregation of relative Risk Estimators) as a general approach for combining existing "external" estimators with a new data set in a time-to-event survival analysis setting. Our method initially employs the new data to fit a flexible family of reproducing kernel estimators via penalised partial likelihood maximisation. The final relative risk estimator is then constructed as a convex combination of the kernel and external estimators, with the convex combination coefficients and regularisation parameters selected using cross-validation. We establish high-probability bounds for the $L_2$-error of our proposed aggregated estimator, showing that it achieves a rate of convergence that is at least as good as both the optimal kernel estimator and the best external model. Empirical results from simulation studies align with the theoretical results, and we illustrate the improvements our methods provide for cardiovascular disease risk modelling. Our methodology is implemented in the Python package care-survival.
翻译:临床风险预测模型会随着新数据的出现而定期更新,这些新数据通常包含额外的协变量。我们提出CARE(相对风险估计量的凸聚合)作为一种通用方法,用于在时间-事件生存分析中将现有的"外部"估计量与新的数据集相结合。我们的方法首先利用新数据,通过惩罚部分似然最大化来拟合一个灵活的再生核估计量族。最终相对风险估计量被构造为核估计量与外部估计量的凸组合,其中凸组合系数和正则化参数通过交叉验证进行选择。我们为所提出的聚合估计量的$L_2$误差建立了高概率界,证明其收敛速度至少与最优核估计量和最佳外部模型相当。仿真研究的实证结果与理论结果一致,并通过心血管疾病风险建模实例展示了我们方法带来的改进。我们的方法已在Python软件包care-survival中实现。