Energy-based Automated Model Evaluation

The conventional evaluation protocols on machine learning models rely heavily on a labeled, i.i.d-assumed testing dataset, which is not often present in real world applications. The Automated Model Evaluation (AutoEval) shows an alternative to this traditional workflow, by forming a proximal prediction pipeline of the testing performance without the presence of ground-truth labels. Despite its recent successes, the AutoEval frameworks still suffer from an overconfidence issue, substantial storage and computational cost. In that regard, we propose a novel measure -- Meta-Distribution Energy (MDE) -- that allows the AutoEval framework to be both more efficient and effective. The core of the MDE is to establish a meta-distribution statistic, on the information (energy) associated with individual samples, then offer a smoother representation enabled by energy-based learning. We further provide our theoretical insights by connecting the MDE with the classification loss. We provide extensive experiments across modalities, datasets and different architectural backbones to validate MDE's validity, together with its superiority compared with prior approaches. We also prove MDE's versatility by showing its seamless integration with large-scale models, and easy adaption to learning scenarios with noisy- or imbalanced- labels. Code and data are available: https://github.com/pengr/Energy_AutoEval

翻译：传统的机器学习模型评估协议严重依赖于带标签且满足独立同分布假设的测试数据集，而这在实际应用中往往难以满足。自动模型评估通过构建无需真实标签即可近似预测测试性能的流程，为传统评估方式提供了替代方案。尽管近期取得进展，自动模型评估框架仍存在过度自信、存储与计算成本过高的问题。为此，我们提出一种新型度量——元分布能量——使自动模型评估框架兼具高效性与有效性。其核心在于建立基于单个样本信息（能量）的元分布统计量，并通过基于能量的学习实现更平滑的表征。我们进一步从理论层面揭示了元分布能量与分类损失之间的关联。通过跨模态、跨数据集及不同架构骨干网络的广泛实验，我们验证了元分布能量的有效性及其相较于先前方法的优越性。同时，我们证明了元分布能量的通用性：它既能与大规模模型无缝集成，也能轻松适配含噪声或类别不平衡标签的学习场景。代码与数据已公开：https://github.com/pengr/Energy_AutoEval

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日