Deep reinforcement learning (deep RL) has achieved tremendous success on various domains through a combination of algorithmic design and careful selection of hyper-parameters. Algorithmic improvements are often the result of iterative enhancements built upon prior approaches, while hyper-parameter choices are typically inherited from previous methods or fine-tuned specifically for the proposed technique. Despite their crucial impact on performance, hyper-parameter choices are frequently overshadowed by algorithmic advancements. This paper conducts an extensive empirical study focusing on the reliability of hyper-parameter selection for value-based deep reinforcement learning agents, including the introduction of a new score to quantify the consistency and reliability of various hyper-parameters. Our findings not only help establish which hyper-parameters are most critical to tune, but also help clarify which tunings remain consistent across different training regimes.
翻译:深度强化学习(deep RL)通过算法设计与超参数的精心选择,已在多个领域取得显著成功。算法改进通常是在先前方法基础上进行迭代增强的结果,而超参数选择则往往继承自先前方法或针对所提技术进行专门微调。尽管超参数对性能具有关键影响,但其选择常被算法进展所掩盖。本文开展了一项广泛的实证研究,重点关注基于价值的深度强化学习智能体超参数选择的可靠性,包括引入一种新评分标准以量化不同超参数的一致性与可靠性。我们的研究结果不仅有助于确定哪些超参数最需要调整,同时也有助于阐明哪些调整在不同训练机制中保持一致性。