Hyperparameters are a critical factor in reliably training well-performing reinforcement learning (RL) agents. Unfortunately, developing and evaluating automated approaches for tuning such hyperparameters is both costly and time-consuming. As a result, such approaches are often only evaluated on a single domain or algorithm, making comparisons difficult and limiting insights into their generalizability. We propose ARLBench, a benchmark for hyperparameter optimization (HPO) in RL that allows comparisons of diverse HPO approaches while being highly efficient in evaluation. To enable research into HPO in RL, even in settings with low compute resources, we select a representative subset of HPO tasks spanning a variety of algorithm and environment combinations. This selection allows for generating a performance profile of an automated RL (AutoRL) method using only a fraction of the compute previously necessary, enabling a broader range of researchers to work on HPO in RL. With the extensive and large-scale dataset on hyperparameter landscapes that our selection is based on, ARLBench is an efficient, flexible, and future-oriented foundation for research on AutoRL. Both the benchmark and the dataset are available at https://github.com/automl/arlbench.
翻译:超参数是可靠训练高性能强化学习(RL)智能体的关键因素。然而,针对此类超参数调优的自动化方法的开发与评估既昂贵又耗时。因此,这些方法通常仅在单一领域或算法上进行评估,导致比较困难且难以洞察其泛化能力。本文提出ARLBench,一个用于强化学习超参数优化(HPO)的基准测试框架,能够在高效评估的同时支持多种HPO方法的比较。为促进即使在低计算资源环境下也能开展RL-HPO研究,我们选取了一组具有代表性的HPO任务,涵盖多种算法与环境组合。通过该精选任务集,仅需以往必要计算量的一小部分即可生成自动化强化学习(AutoRL)方法的性能画像,从而让更广泛的研究者能够参与RL-HPO研究。基于我们构建的覆盖超参数景观的大规模数据集,ARLBench为AutoRL研究提供了一个高效、灵活且面向未来的基础平台。基准测试框架与数据集均公开于https://github.com/automl/arlbench。