Properly understanding the performances of classifiers is essential in various scenarios. However, the literature often relies only on one or two standard scores to compare classifiers, which fails to capture the nuances of application-specific requirements, potentially leading to suboptimal classifier selection. Recently, a paper on the foundations of the theory of performance-based ranking introduced a tool, called the Tile, that organizes an infinity of ranking scores into a 2D map. Thanks to the Tile, it is now possible to evaluate and compare classifiers efficiently, displaying all possible application-specific preferences instead of having to rely on a pair of scores. In this paper, we provide a first hitchhiker's guide for understanding the performances of two-class classifiers by presenting four scenarios, each showcasing a different user profile: a theoretical analyst, a method designer, a benchmarker, and an application developer. Particularly, we show that we can provide different interpretative flavors that are adapted to the user's needs by mapping different values on the Tile. As an illustration, we leverage the newly introduced Tile tool and the different flavors to rank and analyze the performances of 74 state-of-the-art semantic segmentation models in two-class classification through the eyes of the four user profiles. Through these user profiles, we demonstrate that the Tile effectively captures the behavior of classifiers in a single visualization, while accommodating an infinite number of ranking scores.
翻译:正确理解分类器的性能在各种应用场景中至关重要。然而,现有文献通常仅依赖一两种标准指标来比较分类器,这难以捕捉特定应用需求的细微差异,可能导致次优的分类器选择。近期,一篇关于基于性能排序理论基础的论文引入了一种称为"Tile"的工具,该工具将无限多个排序指标组织成二维图谱。借助Tile工具,我们现在能够高效评估和比较分类器,展示所有可能的特定应用偏好,而无需依赖单一指标对。本文通过呈现四种典型用户场景(理论分析者、方法设计者、基准测试者和应用开发者),首次为理解两类分类器性能提供实用指南。特别地,我们展示了通过在Tile上映射不同数值,能够根据用户需求提供差异化的解释视角。作为示例,我们运用新提出的Tile工具及不同解释视角,通过四类用户视角对74个最先进的语义分割模型在二分类任务中的性能进行排序分析。通过这些用户视角的演示,我们证明Tile工具能以单一可视化形式有效捕捉分类器行为特征,同时兼容无限多个排序指标。