Deep reinforcement learning (RL) has achieved remarkable success in solving complex tasks through its integration with deep neural networks (DNNs) as function approximators. However, the reliance on DNNs has introduced a new challenge called primacy bias, whereby these function approximators tend to prioritize early experiences, leading to overfitting. To mitigate this primacy bias, a reset method has been proposed, which performs periodic resets of a portion or the entirety of a deep RL agent while preserving the replay buffer. However, the use of the reset method can result in performance collapses after executing the reset, which can be detrimental from the perspective of safe RL and regret minimization. In this paper, we propose a new reset-based method that leverages deep ensemble learning to address the limitations of the vanilla reset method and enhance sample efficiency. The proposed method is evaluated through various experiments including those in the domain of safe RL. Numerical results show its effectiveness in high sample efficiency and safety considerations.
翻译:深度强化学习通过与深度神经网络作为函数近似器的结合,在解决复杂任务方面取得了显著成功。然而,对深度神经网络的依赖引入了一个新挑战——初始偏差,即这些函数近似器倾向于优先处理早期经验,导致过拟合。为缓解初始偏差,研究人员提出了一种重设方法,该方法在保留回放缓冲区的同时,周期性地重置深度强化学习智能体的部分或全部组件。然而,使用重设方法可能导致执行重设后出现性能崩溃,这从安全强化学习和遗憾最小化的角度来看是有害的。本文提出一种基于重设的新方法,利用深度集成学习来解决原始重设方法的局限性,并提升样本效率。通过在安全强化学习领域及其他多种实验中的评估,数值结果表明该方法在高效样本利用和安全考量方面具有显著有效性。