The separation of performance metrics from gradient based loss functions may not always give optimal results and may miss vital aggregate information. This paper investigates incorporating a performance metric alongside differentiable loss functions to inform training outcomes. The goal is to guide model performance and interpretation by assuming statistical distributions on this performance metric for dynamic weighting. The focus is on van Rijsbergens $F_{\beta}$ metric -- a popular choice for gauging classification performance. Through distributional assumptions on the $F_{\beta}$, an intermediary link can be established to the standard binary cross-entropy via dynamic penalty weights. First, the $F_{\beta}$ metric is reformulated to facilitate assuming statistical distributions with accompanying proofs for the cumulative density function. These probabilities are used within a knee curve algorithm to find an optimal $\beta$ or $\beta_{opt}$. This $\beta_{opt}$ is used as a weight or penalty in the proposed weighted binary cross-entropy. Experimentation on publicly available data along with benchmark analysis mostly yields better and interpretable results as compared to the baseline for both imbalanced and balanced classes. For example, for the IMDB text data with known labeling errors, a 14% boost in $F_1$ score is shown. The results also reveal commonalities between the penalty model families derived in this paper and the suitability of recall-centric or precision-centric parameters used in the optimization. The flexibility of this methodology can enhance interpretation.
翻译:将性能度量与基于梯度的损失函数分离可能并不总能产生最优结果,且可能遗漏关键的聚合信息。本文研究将性能度量与可微损失函数相结合以指导训练结果。目标是通过对性能度量假设统计分布以实现动态加权,从而引导模型性能与解释。研究聚焦于 van Rijsbergen 提出的 $F_{\beta}$ 度量——一种衡量分类性能的常用指标。通过对 $F_{\beta}$ 的分布假设,可建立其与标准二元交叉熵之间的中介联系,该联系通过动态惩罚权重实现。首先,重构 $F_{\beta}$ 度量以便于假设统计分布,并附有累积密度函数的证明。这些概率被用于膝曲线算法以寻找最优 $\beta$ 或 $\beta_{opt}$。该 $\beta_{opt}$ 作为权重或惩罚项应用于所提出的加权二元交叉熵中。在公开数据上的实验及基准分析表明,与基线相比,该方法在非平衡和平衡类别下大多能获得更优且更具可解释性的结果。例如,在存在已知标注错误的 IMDB 文本数据上,$F_1$ 分数提升了 14%。结果还揭示了本文推导的惩罚模型族与优化中使用的以召回率或精确率为中心的参数之间的共性。该方法的灵活性可增强模型可解释性。