Class distribution skews in imbalanced datasets may lead to models with prediction bias towards majority classes, making fair assessment of classifiers a challenging task. Metrics such as Balanced Accuracy are commonly used to evaluate a classifier's prediction performance under such scenarios. However, these metrics fall short when classes vary in importance. In this paper, we propose a simple and general-purpose evaluation framework for imbalanced data classification that is sensitive to arbitrary skews in class cardinalities and importances. Experiments with several state-of-the-art classifiers tested on real-world datasets from three different domains show the effectiveness of our framework - not only in evaluating and ranking classifiers, but also training them.
翻译:非平衡数据集中类别分布的偏斜可能导致模型对多数类产生预测偏差,使得分类器的公平评估成为一项具有挑战性的任务。平衡准确率等指标常用于评估分类器在此类场景下的预测性能,但当各类别重要性不同时,这些指标存在局限性。本文提出一种简单且通用的非平衡数据分类评估框架,该框架对类别数量与重要性的任意偏斜均敏感。基于来自三个不同领域的真实数据集,对多种最先进分类器的实验表明,该框架不仅在分类器的评估与排序上有效,还能用于分类器的训练。