Moderate calibration, the expected event probability among observations with predicted probability z being equal to z, is a desired property of risk prediction models. Current graphical and numerical techniques for evaluating moderate calibration of risk prediction models are mostly based on smoothing or grouping the data. As well, there is no widely accepted inferential method for the null hypothesis that a model is moderately calibrated. In this work, we discuss recently-developed, and propose novel, methods for the assessment of moderate calibration for binary responses. The methods are based on the limiting distributions of functions of standardized partial sums of prediction errors converging to the corresponding laws of Brownian motion. The novel method relies on well-known properties of the Brownian bridge which enables joint inference on mean and moderate calibration, leading to a unified 'bridge' test for detecting miscalibration. Simulation studies indicate that the bridge test is more powerful, often substantially, than the alternative test. As a case study we consider a prediction model for short-term mortality after a heart attack, where we provide suggestions on graphical presentation and the interpretation of results. Moderate calibration can be assessed without requiring arbitrary grouping of data or using methods that require tuning of parameters.
翻译:适度校准——即预测概率为z的观测中事件发生概率期望值等于z——是风险预测模型期望具备的性质。当前用于评估风险预测模型适度校准的图形与数值技术主要基于数据平滑或分组。同时,尚无广泛接受的推断方法用于检验模型是否满足适度校准的原假设。本文探讨了近期发展的方法,并提出了针对二分类响应变量适度校准评估的新方法。这些方法基于预测误差标准化部分和函数的极限分布收敛至布朗运动对应定律的性质。新方法利用布朗桥的已知性质,可对均值校准与适度校准进行联合推断,从而形成统一的"桥"检验以检测校准偏差。模拟研究表明,桥检验的统计效力通常显著高于替代检验方法。通过一项心肌梗死后短期死亡率预测模型的案例研究,我们提供了图形展示与结果解读的建议。该方法无需对数据进行任意分组或使用需参数调优的技术即可完成适度校准评估。