The Automated Model Evaluation (AutoEval) framework entertains the possibility of evaluating a trained machine learning model without resorting to a labeled testing set. Despite the promise and some decent results, the existing AutoEval methods heavily rely on computing distribution shifts between the unlabelled testing set and the training set. We believe this reliance on the training set becomes another obstacle in shipping this technology to real-world ML development. In this work, we propose Contrastive Automatic Model Evaluation (CAME), a novel AutoEval framework that is rid of involving training set in the loop. The core idea of CAME bases on a theoretical analysis which bonds the model performance with a contrastive loss. Further, with extensive empirical validation, we manage to set up a predictable relationship between the two, simply by deducing on the unlabeled/unseen testing set. The resulting framework CAME establishes a new SOTA results for AutoEval by surpassing prior work significantly.
翻译:自动模型评估(AutoEval)框架探索了在不依赖标注测试集的情况下评估已训练机器学习模型性能的可能性。尽管现有方法展现出一定前景并获得可观结果,但其核心机制高度依赖计算无标注测试集与训练集之间的分布偏移。我们认为这种对训练集的依赖成为该技术在实际机器学习开发中落地的新障碍。为此,本文提出对比式自动模型评估(CAME)——一种无需在评估流程中引入训练集的全新AutoEval框架。CAME的核心思想源于一项理论分析,该分析将模型性能与对比损失建立关联。进一步通过大量实证验证,我们仅需在无标注/未见测试集上进行推理,即可成功构建两者间的可预测关系。最终形成的CAME框架显著超越前人工作,为AutoEval树立了新的最优性能基准。