In complex settings, such as healthcare, predictive risk scores play an increasingly crucial role in guiding interventions. However, directly updating risk scores used to guide intervention can lead to biased risk estimates. To address this, we propose updating using a `holdout set' - a subset of the population that does not receive interventions guided by the risk score. Striking a balance in the size of the holdout set is essential, to ensure good performance of the updated risk score whilst minimising the number of held out samples. We prove that this approach enables total costs to grow at a rate $O\left(N^{2/3}\right)$ for a population of size $N$, and argue that in general circumstances there is no competitive alternative. By defining an appropriate loss function, we describe conditions under which an optimal holdout size (OHS) can be readily identified, and introduce parametric and semi-parametric algorithms for OHS estimation, demonstrating their use on a recent risk score for pre-eclampsia. Based on these results, we make the case that a holdout set is a safe, viable and easily implemented means to safely update predictive risk scores.
翻译:在医疗保健等复杂场景中,预测风险评分在指导干预措施方面发挥着日益关键的作用。然而,直接更新用于指导干预的风险评分会导致有偏的风险估计。为解决此问题,我们提出使用"留出集"进行更新——即不接收由风险评分指导干预的人群子集。在留出集规模上取得平衡至关重要,既要确保更新后风险评分的良好性能,又要最小化被留出的样本数量。我们证明该方法可使总成本的增速达到$O\left(N^{2/3}\right)$(其中$N$为总体规模),并论证在一般情况下不存在具有竞争力的替代方案。通过定义适当的损失函数,我们描述了可便捷识别最优留出规模(OHS)的条件,并引入用于OHS估计的参数化与半参数化算法,以近期子痫前期风险评分为例展示其应用。基于这些结果,我们论证留出集是安全、可行且易于实施的方法,可安全用于更新预测风险评分。