Aligning Multiclass Neural Network Classifier Criterion with Task Performance via $F_β$-Score

Multiclass neural network classifiers are typically trained using cross-entropy loss. Following training, the performance of this same neural network is evaluated using an application-specific metric based on the multiclass confusion matrix, such as the Macro $F_\beta$-Score. It is questionable whether the use of cross-entropy will yield a classifier that aligns with the intended application-specific performance criteria, particularly in scenarios where there is a need to emphasize one aspect of classifier performance. For example, if greater precision is preferred over recall, the $\beta$ value in the $F_\beta$ evaluation metric can be adjusted accordingly, but the cross-entropy objective remains unaware of this preference during training. We propose a method that addresses this training-evaluation gap for multiclass neural network classifiers such that users can train these models informed by the desired final $F_\beta$-Score. Following prior work in binary classification, we utilize the concepts of the soft-set confusion matrices and a piecewise-linear approximation of the Heaviside step function. Our method extends the $2 \times 2$ binary soft-set confusion matrix to a multiclass $d \times d$ confusion matrix and proposes dynamic adaptation of the threshold value $\tau$, which parameterizes the piecewise-linear Heaviside approximation during run-time. We present a theoretical analysis that shows that our method can be used to optimize for a soft-set based approximation of Macro-$F_\beta$ that is a consistent estimator of Macro-$F_\beta$, and our extensive experiments show the practical effectiveness of our approach.

翻译：多类神经网络分类器通常使用交叉熵损失进行训练。训练完成后，同一神经网络的性能会基于多类混淆矩阵通过特定应用指标进行评估，例如宏平均$F_β$分数。使用交叉熵损失是否能产生符合预期应用特定性能准则的分类器是值得商榷的，尤其是在需要强调分类器性能某一方面的场景中。例如，若更偏好高精度而非高召回率，可相应调整$F_β$评估指标中的$\beta$值，但交叉熵目标在训练过程中无法感知此偏好。我们提出一种方法，旨在解决多类神经网络分类器中存在的训练-评估差距问题，使得用户能够依据期望的最终$F_β$分数来指导模型训练。借鉴二分类领域的先前工作，我们利用软集混淆矩阵的概念以及分段线性近似的Heaviside阶跃函数。我们的方法将$2 \times 2$的二分类软集混淆矩阵扩展为多类$d \times d$混淆矩阵，并提出了阈值参数$\tau$的动态自适应方法，该参数在运行时对分段线性Heaviside近似进行参数化。我们通过理论分析表明，本方法可用于优化基于软集的宏平均$F_β$近似，该近似是宏平均$F_β$的一致估计量；大量实验结果也验证了我们方法的实际有效性。