When modelling data where the response is dichotomous and highly imbalanced, response-based sampling where a subset of the majority class is retained (i.e., undersampling) is often used to create more balanced training datasets prior to modelling. However, the models fit to this undersampled data, which we refer to as base models, generate predictions that are severely biased. There are several calibration methods that can be used to combat this bias, one of which is Platt's scaling. Here, a logistic regression model is used to model the relationship between the base model's original predictions and the response. Despite its popularity for calibrating models after undersampling, Platt's scaling was not designed for this purpose. Our work presents what we believe is the first detailed study focused on the validity of using Platt's scaling to calibrate models after undersampling. We show analytically, as well as via a simulation study and a case study, that Platt's scaling should not be used for calibration after undersampling without critical thought. If Platt's scaling would have been able to successfully calibrate the base model had it been trained on the entire dataset (i.e., without undersampling), then Platt's scaling might be appropriate for calibration after undersampling. If this is not the case, we recommend a modified version of Platt's scaling that fits a logistic generalized additive model to the logit of the base model's predictions, as it is both theoretically motivated and performed well across the settings considered in our study.
翻译:在处理响应变量为二分类且高度不平衡的数据建模时,通常采用基于响应的抽样方法(即保留多数类子集的欠采样)来在建模前创建更平衡的训练数据集。然而,基于这种欠采样数据拟合的模型(我们称为基础模型)会产生严重偏差的预测结果。现有多种校准方法可用于纠正这种偏差,其中一种便是Platt缩放法。该方法通过逻辑回归模型来建模基础模型原始预测与响应变量之间的关系。尽管Platt缩放在欠采样后模型校准中应用广泛,但其最初并非为此目的设计。本研究首次系统探讨了使用Platt缩放对欠采样后模型进行校准的有效性问题。通过理论分析、模拟研究和案例研究,我们证明若不经审慎思考,不应直接使用Platt缩放进行欠采样后校准。若基础模型在完整数据集(即未经欠采样)上训练时能够通过Platt缩放成功校准,则该方法可能适用于欠采样后的校准场景。若非如此,我们建议采用改进版的Platt缩放方法:通过对基础模型预测值的logit建立逻辑广义可加模型进行拟合。该方法不仅具有理论依据,且在我们研究涉及的所有场景中均表现出良好性能。