We study the geometry of Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves in binary classification problems. The key finding is that many of the most commonly used binary classification metrics are merely functions of the composition function $G := F_p \circ F_n^{-1}$, where $F_p(\cdot)$ and $F_n(\cdot)$ are the class-conditional cumulative distribution functions of the classifier scores in the positive and negative classes, respectively. This geometric perspective facilitates the selection of operating points, understanding the effect of decision thresholds, and comparison between classifiers. It also helps explain how the shapes and geometry of ROC/PR curves reflect classifier behavior, providing objective tools for building classifiers optimized for specific applications with context-specific constraints. We further explore the conditions for classifier dominance, present analytical and numerical examples demonstrating the effects of class separability and variance on ROC and PR geometries, and derive a link between the positive-to-negative class leakage function $G(\cdot)$ and the Kullback--Leibler divergence. The framework highlights practical considerations, such as model calibration, cost-sensitive optimization, and operating point selection under real-world capacity constraints, enabling more informed approaches to classifier deployment and decision-making.
翻译:本研究探讨二分类问题中接收者操作特征曲线与精确率-召回率曲线的几何性质。核心发现表明:多数常用二分类指标本质上仅是复合函数 $G := F_p \circ F_n^{-1}$ 的函数表达式,其中 $F_p(\cdot)$ 与 $F_n(\cdot)$ 分别代表分类器在正类与负类中得分的类条件累积分布函数。该几何视角有助于:1)选择最佳操作点;2)理解决策阈值的影响机制;3)实现分类器间的有效比较。同时,该理论能够解释ROC/PR曲线的几何形态如何反映分类器行为特征,为构建适应特定应用场景与约束条件的优化分类器提供客观工具。我们进一步探究了分类器支配关系的成立条件,通过解析与数值算例展示类间可分性及方差对ROC/PR几何结构的影响,并推导出正负类间泄漏函数 $G(\cdot)$ 与Kullback--Leibler散度的理论关联。本框架强调实际应用中的关键要素,包括模型校准、代价敏感优化及现实容量约束下的操作点选择,为分类器部署与决策制定提供更系统的理论指导。