The harms of class imbalance corrections for machine learning based prediction models: a simulation study

Risk prediction models are increasingly used in healthcare to aid in clinical decision making. In most clinical contexts, model calibration (i.e., assessing the reliability of risk estimates) is critical. Data available for model development are often not perfectly balanced with respect to the modeled outcome (i.e., individuals with vs. without the event of interest are not equally represented in the data). It is common for researchers to correct this class imbalance, yet, the effect of such imbalance corrections on the calibration of machine learning models is largely unknown. We studied the effect of imbalance corrections on model calibration for a variety of machine learning algorithms. Using extensive Monte Carlo simulations we compared the out-of-sample predictive performance of models developed with an imbalance correction to those developed without a correction for class imbalance across different data-generating scenarios (varying sample size, the number of predictors and event fraction). Our findings were illustrated in a case study using MIMIC-III data. In all simulation scenarios, prediction models developed without a correction for class imbalance consistently had equal or better calibration performance than prediction models developed with a correction for class imbalance. The miscalibration introduced by correcting for class imbalance was characterized by an over-estimation of risk and was not always able to be corrected with re-calibration. Correcting for class imbalance is not always necessary and may even be harmful for clinical prediction models which aim to produce reliable risk estimates on an individual basis.

翻译：[translated abstract in Chinese] 风险预测模型在医疗领域中日益广泛地用于辅助临床决策。在大多数临床场景下，模型的校准度（即评估风险估计的可靠性）至关重要。用于模型开发的数据往往相对于建模结局并非完全平衡（即有关事件发生与未发生的个体在数据中比例不均衡）。研究人员常对此类别不平衡进行修正，然而这种不平衡修正对机器学习模型校准度的影响尚不明确。本研究针对多种机器学习算法，探讨了不平衡修正对模型校准度的影响。通过大规模蒙特卡洛模拟，我们在不同数据生成场景下（变化样本量、预测变量数量和事件发生率）比较了经不平衡修正与未经修正开发的模型的样本外预测性能。基于MIMIC-III数据的案例研究进一步验证了研究结果。在所有模拟场景中，未经类别不平衡修正开发的预测模型始终具有与修正模型同等或更优的校准性能。类别不平衡修正导致的校准偏差表现为风险高估，且这种偏差并非总能通过重校准进行修正。对于旨在提供个体化可靠风险估计的临床预测模型而言，类别不平衡修正并非必要，甚至可能有害。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

牛津大学最新《计算代数拓扑》笔记书，107页pdf

专知会员服务

44+阅读 · 2022年2月17日

面向预测数据分析的机器学习，72页pdf

专知会员服务

66+阅读 · 2021年7月18日