The conventional evaluation protocols on machine learning models rely heavily on a labeled, i.i.d-assumed testing dataset, which is not often present in real world applications. The Automated Model Evaluation (AutoEval) shows an alternative to this traditional workflow, by forming a proximal prediction pipeline of the testing performance without the presence of ground-truth labels. Despite its recent successes, the AutoEval frameworks still suffer from an overconfidence issue, substantial storage and computational cost. In that regard, we propose a novel measure -- Meta-Distribution Energy (MDE) -- that allows the AutoEval framework to be both more efficient and effective. The core of the MDE is to establish a meta-distribution statistic, on the information (energy) associated with individual samples, then offer a smoother representation enabled by energy-based learning. We further provide our theoretical insights by connecting the MDE with the classification loss. We provide extensive experiments across modalities, datasets and different architectural backbones to validate MDE's validity, together with its superiority compared with prior approaches. We also prove MDE's versatility by showing its seamless integration with large-scale models, and easy adaption to learning scenarios with noisy- or imbalanced- labels.
翻译:传统的机器学习模型评估协议严重依赖带有标签、基于独立同分布假设的测试数据集,而这在实际应用中往往难以实现。自动模型评估(AutoEval)通过构建无需真实标签的测试性能近邻预测流程,为这一传统工作流提供了替代方案。尽管近期取得了成功,AutoEval框架仍存在过度自信、存储与计算成本高昂的问题。为此,我们提出一种新型指标——元分布能量(MDE)——使AutoEval框架兼具更高效率与效能。MDE的核心在于基于单个样本关联的信息(能量)建立元分布统计量,并通过基于能量的学习获得更平滑的表征。我们进一步从理论层面阐明了MDE与分类损失之间的关联。通过跨模态、跨数据集及不同架构主干网络的广泛实验,我们验证了MDE的有效性及其相较于现有方法的优越性。此外,我们还证明了MDE的通用性:它能与大规模模型无缝集成,并轻松适应含噪声标签或不平衡标签的学习场景。