Deep reinforcement learning (deep RL) has achieved tremendous success on various domains through a combination of algorithmic design and careful selection of hyper-parameters. Algorithmic improvements are often the result of iterative enhancements built upon prior approaches, while hyper-parameter choices are typically inherited from previous methods or fine-tuned specifically for the proposed technique. Despite their crucial impact on performance, hyper-parameter choices are frequently overshadowed by algorithmic advancements. This paper conducts an extensive empirical study focusing on the reliability of hyper-parameter selection for value-based deep reinforcement learning agents, including the introduction of a new score to quantify the consistency and reliability of various hyper-parameters. Our findings not only help establish which hyper-parameters are most critical to tune, but also help clarify which tunings remain consistent across different training regimes.
翻译:深度强化学习(deep RL)通过算法设计与超参数精心选择的结合,已在多个领域取得显著成功。算法改进通常是基于先前方法进行迭代增强的结果,而超参数选择则通常继承自先前方法或针对所提技术进行专门微调。尽管超参数选择对性能具有关键影响,但其重要性常被算法进展所掩盖。本文针对基于价值的深度强化学习智能体,开展了关于超参数选择可靠性的广泛实证研究,包括引入新评分指标以量化不同超参数的一致性与可靠性。我们的研究结果不仅有助于确定哪些超参数最需要进行调优,同时也有助于阐明哪些调优策略在不同训练机制中保持一致性。