Deep reinforcement learning (DRL) is playing an increasingly important role in real-world applications. However, obtaining an optimally performing DRL agent for complex tasks, especially with sparse rewards, remains a significant challenge. The training of a DRL agent can be often trapped in a bottleneck without further progress. In this paper, we propose RICE, an innovative refining scheme for reinforcement learning that incorporates explanation methods to break through the training bottlenecks. The high-level idea of RICE is to construct a new initial state distribution that combines both the default initial states and critical states identified through explanation methods, thereby encouraging the agent to explore from the mixed initial states. Through careful design, we can theoretically guarantee that our refining scheme has a tighter sub-optimality bound. We evaluate RICE in various popular RL environments and real-world applications. The results demonstrate that RICE significantly outperforms existing refining schemes in enhancing agent performance.
翻译:深度强化学习(DRL)在现实应用中扮演着日益重要的角色。然而,为复杂任务(尤其是稀疏奖励任务)训练出性能最优的DRL智能体仍是一项重大挑战。DRL智能体的训练常常会陷入瓶颈,无法取得进一步进展。本文提出RICE——一种创新的强化学习精炼方案,该方案融合解释方法以突破训练瓶颈。RICE的核心思想是构建一种新的初始状态分布,将默认初始状态与通过解释方法识别出的关键状态相结合,从而激励智能体从混合初始状态进行探索。通过精心设计,我们可以在理论上保证该精炼方案具有更紧的次优性界。我们在多种主流RL环境和现实应用中评估了RICE。结果表明,在提升智能体性能方面,RICE显著优于现有精炼方案。