We run an independent comparison of all hyperparameter optimization (hyperopt) engines available in the Ray Tune library. We introduce two ways to normalize and aggregate statistics across data sets and models, one rank-based, and another one sandwiching the score between the random search score and the full grid search score. This affords us i) to rank the hyperopt engines, ii) to make generalized and statistically significant statements on how much they improve over random search, and iii) to make recommendations on which engine should be used to hyperopt a given learning algorithm. We find that most engines beat random search, but that only three of them (HEBO, AX, and BlendSearch) clearly stand out. We also found that some engines seem to specialize in hyperopting certain learning algorithms, which makes it tricky to use hyperopt in comparison studies, since the choice of the hyperopt technique may favor some of the models in the comparison.
翻译:我们对Ray Tune库中所有可用的超参数优化(hyperopt)引擎进行了独立比较。提出了两种跨数据集和模型进行归一化与聚合统计的方法:一种基于排名,另一种将得分介于随机搜索得分与完整网格搜索得分之间。这使得我们能够:i) 对超参数优化引擎进行排名;ii) 就它们相较于随机搜索的提升程度做出具有普遍性和统计显著性的结论;iii) 针对特定学习算法应使用何种超参数优化引擎提出建议。研究发现,大多数引擎优于随机搜索,但仅有三种引擎(HEBO、AX和BlendSearch)明显脱颖而出。此外,部分引擎似乎专门适用于特定学习算法的超参数优化,这导致在比较研究中使用超参数优化时需谨慎——因为超参数优化技术的选择可能偏向比较中的某些模型。