For some classification scenarios, it is desirable to use only those classification instances that a trained model associates with a high certainty. To obtain such high-certainty instances, previous work has proposed accuracy-reject curves. Reject curves allow to evaluate and compare the performance of different certainty measures over a range of thresholds for accepting or rejecting classifications. However, the accuracy may not be the most suited evaluation metric for all applications, and instead precision or recall may be preferable. This is the case, for example, for data with imbalanced class distributions. We therefore propose reject curves that evaluate precision and recall, the recall-reject curve and the precision-reject curve. Using prototype-based classifiers from learning vector quantization, we first validate the proposed curves on artificial benchmark data against the accuracy reject curve as a baseline. We then show on imbalanced benchmarks and medical, real-world data that for these scenarios, the proposed precision- and recall-curves yield more accurate insights into classifier performance than accuracy reject curves.
翻译:在某些分类场景中,希望仅使用训练模型与高置信度相关联的分类实例。为获取此类高置信度实例,先前研究提出了准确率-拒绝曲线。拒绝曲线能够评估和比较不同置信度度量在分类接受或拒绝阈值范围内的性能。然而,准确率并非对所有应用场景均最适用的评估指标,精确率或召回率可能更为合适。以类别分布不平衡的数据为例即是如此。因此,我们提出评估精确率与召回率的拒绝曲线,即召回率-拒绝曲线和精确率-拒绝曲线。通过使用基于学习向量量化的原型分类器,我们首先以准确率拒绝曲线为基准,在人工基准数据上验证所提曲线。随后在不平衡基准数据集和真实医疗数据上表明,针对这些场景,所提出的精确率与召回率拒绝曲线比准确率拒绝曲线更能精确揭示分类器的性能表现。