Although Reinforcement Learning (RL) has shown to be capable of producing impressive results, its use is limited by the impact of its hyperparameters on performance. This often makes it difficult to achieve good results in practice. Automated RL (AutoRL) addresses this difficulty, yet little is known about the dynamics of the hyperparameter landscapes that hyperparameter optimization (HPO) methods traverse in search of optimal configurations. In view of existing AutoRL approaches dynamically adjusting hyperparameter configurations, we propose an approach to build and analyze these hyperparameter landscapes not just for one point in time but at multiple points in time throughout training. Addressing an important open question on the legitimacy of such dynamic AutoRL approaches, we provide thorough empirical evidence that the hyperparameter landscapes strongly vary over time across representative algorithms from RL literature (DQN, PPO, and SAC) in different kinds of environments (Cartpole, Bipedal Walker, and Hopper) This supports the theory that hyperparameters should be dynamically adjusted during training and shows the potential for more insights on AutoRL problems that can be gained through landscape analyses. Our code can be found at https://github.com/automl/AutoRL-Landscape
翻译:尽管强化学习(RL)已展现出能够产生令人瞩目成果的能力,但其应用受限于超参数对性能的影响。这通常导致在实践中难以获得良好结果。自动化强化学习(AutoRL)旨在解决这一难题,然而,关于超参数优化(HPO)方法为寻找最优配置所遍历的超参数景观动态特性,目前仍知之甚少。鉴于现有AutoRL方法会动态调整超参数配置,我们提出了一种方法,不仅针对单一时间点,而是在训练过程中的多个时间点构建并分析这些超参数景观。针对此类动态AutoRL方法合理性的重要未解问题,我们提供了详尽的实证证据,表明在RL文献中代表性算法(DQN、PPO和SAC)于不同环境(Cartpole、Bipedal Walker和Hopper)下运行时,超参数景观会随训练时间显著变化。这支持了超参数应在训练过程中动态调整的理论,并展示了通过景观分析可能为AutoRL问题带来更深入见解的潜力。我们的代码见https://github.com/automl/AutoRL-Landscape。