What Is the Optimal Ranking Score Between Precision and Recall? We Can Always Find It and It Is Rarely $F_1$

Ranking methods or models based on their performance is of prime importance but is tricky because performance is fundamentally multidimensional. In the case of classification, precision and recall are scores with probabilistic interpretations that are both important to consider and complementary. The rankings induced by these two scores are often in partial contradiction. In practice, therefore, it is extremely useful to establish a compromise between the two views to obtain a single, global ranking. Over the last fifty years or so, it has been proposed to take a weighted harmonic mean, known as the F-score, F-measure, or $F_β$. Generally speaking, by averaging basic scores, we obtain a score that is intermediate in terms of values. However, there is no guarantee that these scores lead to meaningful rankings and no guarantee that the rankings are good tradeoffs between these base scores. Given the ubiquity of $F_β$ scores in the literature, some clarification is in order. Concretely: (1) We establish that $F_β$-induced rankings are meaningful and define a shortest path between precision- and recall-induced rankings. (2) We frame the problem of finding a tradeoff between two scores as an optimization problem expressed with Kendall rank correlations. We show that $F_1$ and its skew-insensitive version are far from being optimal in that regard. (3) We provide theoretical tools and a closed-form expression to find the optimal value for $β$ for any distribution or set of performances, and we illustrate their use on six case studies. Code is available at https://github.com/pierard/cvpr-2026-optimal-tradeoff-precision-recall.

翻译：根据性能对排序方法或模型进行排序至关重要，但又颇具挑战性，因为性能本质上是多维的。以分类问题为例，精确率（precision）和召回率（recall）是具有概率解释的分数，两者都值得关注且相互补充。这两个分数所引发的排序往往存在部分矛盾。因此，在实践中，建立这两种视图之间的折衷方案以获得单一的全局排序极具价值。在过去约五十年间，学术界提出了采用加权调和平均数的方法，即F-score、F-measure或$F_β$。通常而言，对基础分数取平均会得到一个数值上居中的分数。然而，这些分数是否能保证产生有意义的排序，以及这些排序能否成为基础分数之间的良好折衷，均无保障。鉴于$F_β$分数在文献中的普遍性，有必要对此进行澄清。具体而言：(1) 我们证实了$F_β$诱导的排序是有意义的，并定义了精确率与召回率诱导排序之间的最短路径。(2) 我们将两个分数之间寻找折衷方案的问题框架化为一个以Kendall秩相关系数表达的优化问题。我们表明$F_1$及其对偏斜不敏感的版本在此方面远非最优。(3) 我们提供了理论工具和闭式表达式，用于针对任意分布或性能集合寻找$β$的最优值，并通过六个案例研究展示了其应用。代码可在 https://github.com/pierard/cvpr-2026-optimal-tradeoff-precision-recall 获取。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

【KDD2024】CAFO：基于特征的时间序列分类解释

专知会员服务

25+阅读 · 2024年6月5日

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

66+阅读 · 2023年2月15日