Replaying data is a principal mechanism underlying the stability and data efficiency of off-policy reinforcement learning (RL). We present an effective yet simple framework to extend the use of replays across multiple experiments, minimally adapting the RL workflow for sizeable improvements in controller performance and research iteration times. At its core, Replay Across Experiments (RaE) involves reusing experience from previous experiments to improve exploration and bootstrap learning while reducing required changes to a minimum in comparison to prior work. We empirically show benefits across a number of RL algorithms and challenging control domains spanning both locomotion and manipulation, including hard exploration tasks from egocentric vision. Through comprehensive ablations, we demonstrate robustness to the quality and amount of data available and various hyperparameter choices. Finally, we discuss how our approach can be applied more broadly across research life cycles and can increase resilience by reloading data across random seeds or hyperparameter variations.
翻译:重放数据是离线策略强化学习(off-policy RL)实现稳定性和数据效率的核心机制。我们提出一种简洁而有效的框架,将重放的适用范围扩展至多个实验,仅需对强化学习流程进行最小程度的调整,即可显著提升控制器性能与研究迭代速度。该框架的核心思想——跨实验重放(RaE)——通过复用先前实验的经验来改进探索过程并引导学习,相较于现有工作,所需改动极少。我们在涵盖运动控制与操作任务的多个挑战性控制域(包括基于自我中心视觉的困难探索任务)上,基于多种强化学习算法实验验证了其优势。通过全面的消融实验,我们证明了该方法对数据质量、可用数据量以及多种超参数选择的鲁棒性。最后,我们讨论了如何将本方法更广泛地应用于研究生命周期,并通过跨随机种子或超参数变体重新加载数据来增强算法鲁棒性。