We show that established performance metrics in binary classification, such as the F-score, the Jaccard similarity coefficient or Matthews' correlation coefficient (MCC), are not robust to class imbalance in the sense that if the proportion of the minority class tends to $0$, the true positive rate (TPR) of the Bayes classifier under these metrics tends to $0$ as well. Thus, in imbalanced classification problems, these metrics favour classifiers which ignore the minority class. To alleviate this issue we introduce robust modifications of the F-score and the MCC for which, even in strongly imbalanced settings, the TPR is bounded away from $0$. We numerically illustrate the behaviour of the various performance metrics in simulations as well as on a credit default data set. We also discuss connections to the ROC and precision-recall curves and give recommendations on how to combine their usage with performance metrics.
翻译:我们证明,在二分类问题中,诸如F-score、Jaccard相似系数或马修斯相关系数(MCC)等经典性能度量指标,对类别不平衡并不具有稳健性——具体而言,若少数类样本比例趋近于0,则在这些度量指标下贝叶斯分类器的真正例率(TPR)同样趋近于0。因此在处理不平衡分类问题时,这些指标会倾向于偏好忽略少数类的分类器。针对这一问题,我们提出了F-score和MCC的稳健修正版本,即使在严重不平衡的设定下,其TPR仍能保持非零下界。我们通过模拟实验及信用违约数据集数值展示了各性能度量指标的表现,同时讨论了与ROC曲线及精确率-召回率曲线的关联,并就如何将其与性能度量指标结合使用提出了建议。