Properly understanding the performances of classifiers is essential in various scenarios. However, the literature often relies only on one or two standard scores to compare classifiers, which fails to capture the nuances of application-specific requirements, potentially leading to suboptimal classifier selection. Recently, a paper on the foundations of the theory of performance-based ranking introduced a tool, called the Tile, that organizes an infinity of ranking scores into a 2D map. Thanks to the Tile, it is now possible to evaluate and compare classifiers efficiently, displaying all possible application-specific preferences instead of having to rely on a pair of scores. In this paper, we provide a first hitchhiker's guide for understanding the performances of two-class classifiers by presenting four scenarios, each showcasing a different user profile: a theoretical analyst, a method designer, a benchmarker, and an application developer. Particularly, we show that we can provide different interpretative flavors that are adapted to the user's needs by mapping different values on the Tile. As an illustration, we leverage the newly introduced Tile tool and the different flavors to rank and analyze the performances of 74 state-of-the-art semantic segmentation models in two-class classification through the eyes of the four user profiles. Through these user profiles, we demonstrate that the Tile effectively captures the behavior of classifiers in a single visualization, while accommodating an infinite number of ranking scores.
翻译:准确理解分类器的性能在各种应用场景中至关重要。然而,现有文献通常仅依赖一两个标准指标来比较分类器,这难以捕捉特定应用需求的细微差异,可能导致次优的分类器选择。近期,一篇关于基于性能排序理论基础的论文引入了一种称为"Tile"的工具,该工具将无限多个排序指标组织成二维图谱。借助Tile工具,现在能够高效评估和比较分类器,展示所有可能的特定应用偏好,而无需依赖单一指标对。本文通过呈现四种典型用户场景(理论分析者、方法设计者、基准测试者和应用开发者),首次系统阐述了两类分类器性能的理解框架。特别地,我们展示了如何通过在Tile上映射不同数值,为用户需求提供差异化的解释视角。为具体说明,我们运用新提出的Tile工具与多维度解释方法,通过四种用户视角对74个最先进的语义分割模型在二分类任务中的性能进行排序与分析。通过这些用户视角的实证,我们证明了Tile工具能够通过单一可视化呈现分类器的行为特征,同时兼容无限多个排序指标。