Actor-critic deep reinforcement learning (DRL) algorithms have recently achieved prominent success in tackling various challenging reinforcement learning (RL) problems, particularly complex control tasks with high-dimensional continuous state and action spaces. Nevertheless, existing research showed that actor-critic DRL algorithms often failed to explore their learning environments effectively, resulting in limited learning stability and performance. To address this limitation, several ensemble DRL algorithms have been proposed lately to boost exploration and stabilize the learning process. However, most of existing ensemble algorithms do not explicitly train all base learners towards jointly optimizing the performance of the ensemble. In this paper, we propose a new technique to train an ensemble of base learners based on an innovative multi-step integration method. This training technique enables us to develop a new hierarchical learning algorithm for ensemble DRL that effectively promotes inter-learner collaboration through stable inter-learner parameter sharing. The design of our new algorithm is verified theoretically. The algorithm is also shown empirically to outperform several state-of-the-art DRL algorithms on multiple benchmark RL problems.
翻译:演员-评论家深度强化学习(DRL)算法最近在应对各种具有挑战性的强化学习(RL)问题,特别是高维连续状态和动作空间的复杂控制任务中取得了显著成功。然而,现有研究表明,演员-评论家深度强化学习算法往往无法有效探索其学习环境,导致学习稳定性和性能受限。为解决这一局限,近期提出了一些集成深度强化学习算法,以增强探索并稳定学习过程。但大多数现有集成算法并未明确训练所有基学习器以共同优化整体的性能。在本文中,我们提出了一种基于创新性多步集成方法的新技术来训练基学习器集成。该训练技术使我们能够为集成深度强化学习开发一种新的分层学习算法,该算法通过稳定的学习器间参数共享有效促进学习器间的协作。我们新算法的设计得到了理论验证。该算法在多个基准强化学习问题上的表现也实证优于几种最先进的深度强化学习算法。