Experience replay (ER) is a crucial component of many deep reinforcement learning (RL) systems. However, uniform sampling from an ER buffer can lead to slow convergence and unstable asymptotic behaviors. This paper introduces Stratified Sampling from Event Tables (SSET), which partitions an ER buffer into Event Tables, each capturing important subsequences of optimal behavior. We prove a theoretical advantage over the traditional monolithic buffer approach and combine SSET with an existing prioritized sampling strategy to further improve learning speed and stability. Empirical results in challenging MiniGrid domains, benchmark RL environments, and a high-fidelity car racing simulator demonstrate the advantages and versatility of SSET over existing ER buffer sampling approaches.
翻译:经验回放(Experience Replay, ER)是许多深度强化学习(Deep Reinforcement Learning, RL)系统的关键组成部分。然而,从经验回放缓冲区中进行均匀采样会导致收敛速度缓慢以及渐近行为不稳定。本文提出了基于事件表的分层采样方法(Stratified Sampling from Event Tables, SSET),该方法将经验回放缓冲区划分为多个事件表,每个事件表捕捉最优行为的重要子序列。我们从理论上证明了该方法相对于传统单一缓冲区方法的优势,并将SSET与现有的优先采样策略相结合,以进一步提升学习速度和稳定性。在具有挑战性的MiniGrid领域、基准强化学习环境以及高保真赛车模拟器中的实验结果表明,SSET相较于现有经验回放缓冲区采样方法具有显著优势和广泛适用性。