Auto-evaluation aims to automatically evaluate a trained model on any test dataset without human annotations. Most existing methods utilize global statistics of features extracted by the model as the representation of a dataset. This ignores the influence of the classification head and loses category-wise confusion information of the model. However, ratios of instances assigned to different categories together with their confidence scores reflect how many instances in which categories are difficult for the model to classify, which contain significant indicators for both overall and category-wise performances. In this paper, we propose a Confidence-based Category Relation-aware Regression ($C^2R^2$) method. $C^2R^2$ divides all instances in a meta-set into different categories according to their confidence scores and extracts the global representation from them. For each category, $C^2R^2$ encodes its local confusion relations to other categories into a local representation. The overall and category-wise performances are regressed from global and local representations, respectively. Extensive experiments show the effectiveness of our method.
翻译:自动评估旨在无需人工标注,自动评估训练模型在任意测试数据集上的性能。现有方法大多利用模型提取特征的全局统计量作为数据集的表示,这忽略了分类头部的影响,并丢失了模型在类别层面的混淆信息。然而,分配给不同类别的实例比例及其置信度分数反映了模型难以分类的类别与实例数量,包含了整体性能和类别性能的重要指标。本文提出了一种基于置信度的类别关系感知回归方法($C^2R^2$)。$C^2R^2$将元集中的所有实例根据其置信度分数划分为不同类别,并从中提取全局表示。对于每个类别,$C^2R^2$将其与其他类别的局部混淆关系编码为局部表示。整体性能和类别性能分别从全局表示和局部表示中回归得到。大量实验证明了我们方法的有效性。