In reinforcement learning, temporal abstraction in the action space, exemplified by action repetition, is a technique to facilitate policy learning through extended actions. However, a primary limitation in previous studies of action repetition is its potential to degrade performance, particularly when sub-optimal actions are repeated. This issue often negates the advantages of action repetition. To address this, we propose a novel algorithm named Uncertainty-aware Temporal Extension (UTE). UTE employs ensemble methods to accurately measure uncertainty during action extension. This feature allows policies to strategically choose between emphasizing exploration or adopting an uncertainty-averse approach, tailored to their specific needs. We demonstrate the effectiveness of UTE through experiments in Gridworld and Atari 2600 environments. Our findings show that UTE outperforms existing action repetition algorithms, effectively mitigating their inherent limitations and significantly enhancing policy learning efficiency.
翻译:在强化学习中,动作空间上的时间抽象(例如动作重复)是一种通过扩展动作来促进策略学习的技术。然而,以往关于动作重复的研究存在一个主要局限性:当重复次优动作时,其可能降低性能。这一问题往往抵消了动作重复的优势。为解决此问题,我们提出了一种名为“不确定性感知时间扩展”(UTE)的新算法。UTE利用集成方法在动作扩展过程中精确测量不确定性。这一特性使策略能够根据特定需求,策略性地选择强调探索或采用避免不确定性的方法。我们通过在Gridworld和Atari 2600环境中的实验证明了UTE的有效性。结果表明,UTE优于现有的动作重复算法,能有效缓解其固有局限性,并显著提升策略学习效率。