Changes to hyperparameters can have a dramatic effect on model accuracy. Thus, the tuning of hyperparameters plays an important role in optimizing machine-learning models. An integral part of the hyperparameter-tuning process is the evaluation of model checkpoints, which is done through the use of "validators". In a supervised setting, these validators evaluate checkpoints by computing accuracy on a validation set that has labels. In contrast, in an unsupervised setting, the validation set has no such labels. Without any labels, it is impossible to compute accuracy, so validators must estimate accuracy instead. But what is the best approach to estimating accuracy? In this paper, we consider this question in the context of unsupervised domain adaptation (UDA). Specifically, we propose three new validators, and we compare and rank them against five other existing validators, on a large dataset of 1,000,000 checkpoints. Extensive experimental results show that two of our proposed validators achieve state-of-the-art performance in various settings. Finally, we find that in many cases, the state-of-the-art is obtained by a simple baseline method. To the best of our knowledge, this is the largest empirical study of UDA validators to date. Code is available at https://www.github.com/KevinMusgrave/powerful-benchmarker.
翻译:超参数的变化会对模型精度产生显著影响,因此超参数调优在优化机器学习模型中扮演重要角色。超参数调优过程的一个核心环节是通过"验证器"评估模型检查点。在监督场景中,验证器通过计算带标签验证集上的准确率来评估检查点;而在无监督场景中,验证集不含标签。由于缺少标签,无法直接计算准确率,验证器必须转而估计准确率。那么,估计准确率的最佳方法是什么?本文围绕无监督域适应(UDA)场景探讨该问题。具体而言,我们提出了三种新验证器,并在包含1,000,000个检查点的大规模数据集上,将它们与五种现有验证器进行对比和排名。大量实验结果表明,我们提出的两种验证器在各种场景下均达到了最先进性能。最后,我们发现许多情况下,简单的基线方法即可获得最高水平。据我们所知,这是迄今为止针对UDA验证器规模最大的实证研究。代码开源地址:https://www.github.com/KevinMusgrave/powerful-benchmarker。