Several performance measures are used to evaluate binary and multiclass classification tasks. But individual observations may often have distinct weights, and none of these measures are sensitive to such varying weights. We propose a new weighted Pearson-Matthews Correlation Coefficient (MCC) for binary classification as well as weighted versions of related multiclass measures. The weighted MCC varies between $-1$ and $1$. But crucially, the weighted MCC values are higher for classifiers that perform better on highly weighted observations, and hence is able to distinguish them from classifiers that have a similar overall performance and ones that perform better on the lowly weighted observations. Furthermore, we prove that the weighted measures are robust with respect to the choice of weights in a precise manner: if the weights are changed by at most $ε$, the value of the weighted measure changes at most by a factor of $ε$ in the binary case and by a factor of $ε^2$ in the multiclass case. Our computations demonstrate that the weighted measures clearly identify classifiers that perform better on higher weighted observations, while the unweighted measures remain completely indifferent to the choices of weights.
翻译:在评估二分类与多类分类任务时,常使用多种性能度量指标。然而,个体观测值往往具有不同的权重,而现有度量指标均无法对此类权重差异作出响应。本文针对二分类任务提出了一种新的加权皮尔逊-马修斯相关系数(MCC),并进一步扩展出相关多类度量的加权版本。加权MCC的取值范围为$-1$至$1$。其关键特性在于:对于在高度加权观测值上表现更优的分类器,加权MCC会给出更高的评价值,从而能够将其与整体性能相似但仅在低权重观测值上表现更优的分类器明确区分。此外,我们通过严格证明表明加权度量在权重选择方面具有精确的鲁棒性:若权重变化幅度不超过$ε$,则二分类场景下加权度量值的变化不超过$ε$倍,多分类场景下不超过$ε^2$倍。计算实验表明,加权度量能清晰识别在较高权重观测值上表现更优的分类器,而非加权度量则对权重选择完全无响应。