Optimizing with Low Budgets: a Comparison on the Black-box Optimization Benchmarking Suite and OpenAI Gym

The growing ubiquity of machine learning (ML) has led it to enter various areas of computer science, including black-box optimization (BBO). Recent research is particularly concerned with Bayesian optimization (BO). BO-based algorithms are popular in the ML community, as they are used for hyperparameter optimization and more generally for algorithm configuration. However, their efficiency decreases as the dimensionality of the problem and the budget of evaluations increase. Meanwhile, derivative-free optimization methods have evolved independently in the optimization community. Therefore, we urge to understand whether cross-fertilization is possible between the two communities, ML and BBO, i.e., whether algorithms that are heavily used in ML also work well in BBO and vice versa. Comparative experiments often involve rather small benchmarks and show visible problems in the experimental setup, such as poor initialization of baselines, overfitting due to problem-specific setting of hyperparameters, and low statistical significance. With this paper, we update and extend a comparative study presented by Hutter et al. in 2013. We compare BBO tools for ML with more classical heuristics, first on the well-known BBOB benchmark suite from the COCO environment and then on Direct Policy Search for OpenAI Gym, a reinforcement learning benchmark. Our results confirm that BO-based optimizers perform well on both benchmarks when budgets are limited, albeit with a higher computational cost, while they are often outperformed by algorithms from other families when the evaluation budget becomes larger. We also show that some algorithms from the BBO community perform surprisingly well on ML tasks.

翻译：机器学习的日益普及使其进入计算机科学的各个领域，包括黑箱优化。近期研究尤为关注贝叶斯优化。基于贝叶斯优化的算法在机器学习社区中广受欢迎，因其被用于超参数优化及更广泛的算法配置。然而，随着问题维度和评估预算的增加，其效率逐渐降低。与此同时，无导数优化方法在优化社区中独立发展。因此，我们亟需探究两个社区（机器学习社区与黑箱优化社区）之间是否存在交叉融合的可能性，即机器学习中广泛使用的算法是否同样适用于黑箱优化，反之亦然。现有对比实验通常涉及较小规模的基准测试，且在实验设置中存在明显问题，例如基线的初始化不佳、因特定问题设置超参数导致的过拟合，以及统计显著性不足。本文对Hutter等人2013年提出的比较研究进行了更新与扩展。我们首先在COCO环境中知名的BBOB基准测试套件上，将用于机器学习的黑箱优化工具与更传统的启发式方法进行对比，随后在强化学习基准测试OpenAI Gym上开展直接策略搜索的比较。结果表明，在预算有限时，基于贝叶斯优化的优化器在两个基准测试中表现良好，但计算成本较高；而当评估预算增大时，它们往往被其他类别的算法超越。我们还发现，来自黑箱优化社区的部分算法在机器学习任务上表现出惊人的性能。