A key theme in the past decade has been that when large neural networks and large datasets combine they can produce remarkable results. In deep reinforcement learning (RL), this paradigm is commonly made possible through experience replay, whereby a dataset of past experiences is used to train a policy or value function. However, unlike in supervised or self-supervised learning, an RL agent has to collect its own data, which is often limited. Thus, it is challenging to reap the benefits of deep learning, and even small neural networks can overfit at the start of training. In this work, we leverage the tremendous recent progress in generative modeling and propose Synthetic Experience Replay (SynthER), a diffusion-based approach to flexibly upsample an agent's collected experience. We show that SynthER is an effective method for training RL agents across offline and online settings, in both proprioceptive and pixel-based environments. In offline settings, we observe drastic improvements when upsampling small offline datasets and see that additional synthetic data also allows us to effectively train larger networks. Furthermore, SynthER enables online agents to train with a much higher update-to-data ratio than before, leading to a significant increase in sample efficiency, without any algorithmic changes. We believe that synthetic training data could open the door to realizing the full potential of deep learning for replay-based RL algorithms from limited data. Finally, we open-source our code at https://github.com/conglu1997/SynthER.
翻译:过去十年的一个关键主题是,当大型神经网络与大型数据集相结合时,能够产生显著成果。在深度强化学习(RL)中,这一范式通常通过经验回放实现,即利用过去经验的数据集来训练策略或价值函数。然而,与监督学习或自监督学习不同,强化学习代理必须自行收集数据,而这些数据往往有限。因此,难以完全发挥深度学习的优势,甚至小型神经网络在训练初期也可能出现过拟合。在本工作中,我们利用近年来生成式建模的巨大进展,提出了一种基于扩散的方法——合成经验回放(SynthER),以灵活地对代理收集的经验进行上采样。我们证明,SynthER是一种在离线和在线设置下,以及基于本体感知和像素环境中训练强化学习代理的有效方法。在离线设置中,当对小型离线数据集进行上采样时,我们观察到显著提升,并且额外的合成数据还使我们能够有效训练更大的网络。此外,SynthER使在线代理能够在比以往更高的更新-数据比率下进行训练,从而在无需任何算法改动的情况下大幅提升样本效率。我们相信,合成训练数据可能为从有限数据中实现基于回放的强化学习算法充分利用深度学习的潜力打开大门。最后,我们在 https://github.com/conglu1997/SynthER 上开源了我们的代码。