Predictive risk scores for adverse outcomes are increasingly crucial in guiding health interventions. Such scores may need to be periodically updated due to change in the distributions they model. However, directly updating risk scores used to guide intervention can lead to biased risk estimates. To address this, we propose updating using a `holdout set' - a subset of the population that does not receive interventions guided by the risk score. Balancing the holdout set size is essential to ensure good performance of the updated risk score whilst minimising the number of held out samples. We prove that this approach reduces adverse outcome frequency to an asymptotically optimal level and argue that often there is no competitive alternative. We describe conditions under which an optimal holdout size (OHS) can be readily identified, and introduce parametric and semi-parametric algorithms for OHS estimation. We apply our methods to the ASPRE risk score for pre-eclampsia to recommend a plan for updating it in the presence of change in the underlying data distribution. We show that, in order to minimise the number of pre-eclampsia cases over time, this is best achieved using a holdout set of around 10,000 individuals.
翻译:不良结局的预测风险评分在指导健康干预中日益重要。由于所建模的分布发生变化,此类评分可能需要定期更新。然而,直接更新用于指导干预的风险评分可能导致有偏的风险估计。为解决此问题,我们提出使用“保留集”进行更新——即不接受风险评分指导干预的人群子集。平衡保留集规模对于确保更新后风险评分的良好性能同时最小化保留样本数量至关重要。我们证明该方法能将不良结局频率降低至渐近最优水平,并论证通常不存在具有竞争力的替代方案。我们描述了可轻松确定最优保留集规模的条件,并介绍了参数化与半参数化的最优保留集规模估计算法。我们将该方法应用于子痫前期ASPRE风险评分,为基础数据分布变化时的更新方案提供建议。研究表明,为随时间最小化子痫前期病例数,最佳方案是使用约10,000个体的保留集。