Modern machine learning models are often constructed taking into account multiple objectives, e.g., minimizing inference time while also maximizing accuracy. Multi-objective hyperparameter optimization (MHPO) algorithms return such candidate models, and the approximation of the Pareto front is used to assess their performance. In practice, we also want to measure generalization when moving from the validation to the test set. However, some of the models might no longer be Pareto-optimal which makes it unclear how to quantify the performance of the MHPO method when evaluated on the test set. To resolve this, we provide a novel evaluation protocol that allows measuring the generalization performance of MHPO methods and studying its capabilities for comparing two optimization experiments.
翻译:现代机器学习模型经常需要考虑多个目标来构建,例如在最小化推理时间的同时最大化准确率。多目标超参数优化(MHPO)算法返回此类候选模型,并通过帕累托前沿的近似来评估其性能。在实际应用中,我们还需要衡量从验证集迁移到测试集时的泛化能力。然而,部分模型可能不再处于帕累托最优状态,这使得在测试集上评估MHPO方法的性能时难以量化。为解决这一问题,我们提出了一种新的评估协议,该协议能够衡量MHPO方法的泛化性能,并研究其用于比较两个优化实验的能力。