Risk prediction models are increasingly used in healthcare to aid in clinical decision making. In most clinical contexts, model calibration (i.e., assessing the reliability of risk estimates) is critical. Data available for model development are often not perfectly balanced with respect to the modeled outcome (i.e., individuals with vs. without the event of interest are not equally represented in the data). It is common for researchers to correct this class imbalance, yet, the effect of such imbalance corrections on the calibration of machine learning models is largely unknown. We studied the effect of imbalance corrections on model calibration for a variety of machine learning algorithms. Using extensive Monte Carlo simulations we compared the out-of-sample predictive performance of models developed with an imbalance correction to those developed without a correction for class imbalance across different data-generating scenarios (varying sample size, the number of predictors and event fraction). Our findings were illustrated in a case study using MIMIC-III data. In all simulation scenarios, prediction models developed without a correction for class imbalance consistently had equal or better calibration performance than prediction models developed with a correction for class imbalance. The miscalibration introduced by correcting for class imbalance was characterized by an over-estimation of risk and was not always able to be corrected with re-calibration. Correcting for class imbalance is not always necessary and may even be harmful for clinical prediction models which aim to produce reliable risk estimates on an individual basis.
翻译:[translated abstract in Chinese]
风险预测模型在医疗领域中日益广泛地用于辅助临床决策。在大多数临床场景下,模型的校准度(即评估风险估计的可靠性)至关重要。用于模型开发的数据往往相对于建模结局并非完全平衡(即有关事件发生与未发生的个体在数据中比例不均衡)。研究人员常对此类别不平衡进行修正,然而这种不平衡修正对机器学习模型校准度的影响尚不明确。本研究针对多种机器学习算法,探讨了不平衡修正对模型校准度的影响。通过大规模蒙特卡洛模拟,我们在不同数据生成场景下(变化样本量、预测变量数量和事件发生率)比较了经不平衡修正与未经修正开发的模型的样本外预测性能。基于MIMIC-III数据的案例研究进一步验证了研究结果。在所有模拟场景中,未经类别不平衡修正开发的预测模型始终具有与修正模型同等或更优的校准性能。类别不平衡修正导致的校准偏差表现为风险高估,且这种偏差并非总能通过重校准进行修正。对于旨在提供个体化可靠风险估计的临床预测模型而言,类别不平衡修正并非必要,甚至可能有害。