Hyperparameters are a critical factor in reliably training well-performing reinforcement learning (RL) agents. Unfortunately, developing and evaluating automated approaches for tuning such hyperparameters is both costly and time-consuming. As a result, such approaches are often only evaluated on a single domain or algorithm, making comparisons difficult and limiting insights into their generalizability. We propose ARLBench, a benchmark for hyperparameter optimization (HPO) in RL that allows comparisons of diverse HPO approaches while being highly efficient in evaluation. To enable research into HPO in RL, even in settings with low compute resources, we select a representative subset of HPO tasks spanning a variety of algorithm and environment combinations. This selection allows for generating a performance profile of an automated RL (AutoRL) method using only a fraction of the compute previously necessary, enabling a broader range of researchers to work on HPO in RL. With the extensive and large-scale dataset on hyperparameter landscapes that our selection is based on, ARLBench is an efficient, flexible, and future-oriented foundation for research on AutoRL. Both the benchmark and the dataset are available at https://github.com/automl/arlbench.
翻译:超参数是可靠训练高性能强化学习智能体的关键因素。然而,开发和评估用于调整此类超参数的自动化方法既昂贵又耗时。因此,这些方法通常仅在单一领域或算法上进行评估,使得比较变得困难,并限制了对其泛化能力的深入理解。我们提出ARLBench,这是一个用于强化学习超参数优化的基准测试,能够在高效评估的同时,对多种超参数优化方法进行比较。为了促进强化学习超参数优化的研究,即使在计算资源有限的环境中,我们选取了涵盖多种算法与环境组合的代表性超参数优化任务子集。这一选择使得仅需以往所需计算资源的一小部分,即可生成自动化强化学习方法的性能概况,从而让更广泛的研究者能够开展强化学习超参数优化的研究。基于我们选取所依据的广泛且大规模的超参数景观数据集,ARLBench为自动化强化学习研究提供了一个高效、灵活且面向未来的基础。基准测试及数据集均可在https://github.com/automl/arlbench获取。