The exploration \& exploitation dilemma poses significant challenges in reinforcement learning (RL). Recently, curiosity-based exploration methods achieved great success in tackling hard-exploration problems. However, they necessitate extensive hyperparameter tuning on different environments, which heavily limits the applicability and accessibility of this line of methods. In this paper, we characterize this problem via analysis of the agent behavior, concluding the fundamental difficulty of choosing a proper hyperparameter. We then identify the difficulty and the instability of the optimization when the agent learns with curiosity. We propose our method, hyperparameter robust exploration (\textbf{Hyper}), which extensively mitigates the problem by effectively regularizing the visitation of the exploration and decoupling the exploitation to ensure stable training. We theoretically justify that \textbf{Hyper} is provably efficient under function approximation setting and empirically demonstrate its appealing performance and robustness in various environments.
翻译:探索与利用困境在强化学习(RL)中构成了重大挑战。近年来,基于好奇心的探索方法在解决难探索问题上取得了显著成功。然而,这些方法需要在不同环境中进行大量超参数调优,严重限制了此类方法的适用性与可及性。本文通过分析智能体行为来刻画该问题,指出选择合适超参数的根本困难。我们进一步识别了智能体基于好奇心学习时优化过程的困难性与不稳定性。为此,我们提出超参数鲁棒探索方法(\textbf{Hyper}),该方法通过有效正则化探索访问频率并将利用过程解耦以确保训练稳定性,从而显著缓解了上述问题。我们从理论上证明在函数逼近设定下\textbf{Hyper}具有可证明的效率,并通过多环境实验验证了其优异的性能与鲁棒性。