Recent reinforcement learning approaches have shown surprisingly strong capabilities of bang-bang policies for solving continuous control benchmarks. The underlying coarse action space discretizations often yield favourable exploration characteristics while final performance does not visibly suffer in the absence of action penalization in line with optimal control theory. In robotics applications, smooth control signals are commonly preferred to reduce system wear and energy efficiency, but action costs can be detrimental to exploration during early training. In this work, we aim to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution, taking advantage of recent results in decoupled Q-learning to scale our approach to high-dimensional action spaces up to dim(A) = 38. Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that yield surprisingly strong performance on continuous control tasks.
翻译:近期强化学习方法展现出令人惊讶的强大能力,即通过bang-bang策略解决连续控制基准测试。根据最优控制理论,这种粗粒度的动作空间离散化往往能带来有利的探索特性,且最终性能在缺乏动作惩罚的情况下不会明显受损。在机器人应用中,为降低系统损耗和提高能效,平滑控制信号通常被优先选择,但动作代价可能会损害早期训练阶段的探索效率。本研究旨在通过从粗到细的控制分辨率逐步扩展离散动作空间来弥合这一性能差距,并利用解耦Q学习的近期成果将我们的方法扩展到高达dim(A)=38的高维动作空间。实验表明,自适应控制分辨率与值分解相结合,能构建出简单的仅含评论家算法,在连续控制任务上展现出令人惊讶的强大性能。