A key theme in the past decade has been that when large neural networks and large datasets combine they can produce remarkable results. In deep reinforcement learning (RL), this paradigm is commonly made possible through experience replay, whereby a dataset of past experiences is used to train a policy or value function. However, unlike in supervised or self-supervised learning, an RL agent has to collect its own data, which is often limited. Thus, it is challenging to reap the benefits of deep learning, and even small neural networks can overfit at the start of training. In this work, we leverage the tremendous recent progress in generative modeling and propose Synthetic Experience Replay (SynthER), a diffusion-based approach to flexibly upsample an agent's collected experience. We show that SynthER is an effective method for training RL agents across offline and online settings, in both proprioceptive and pixel-based environments. In offline settings, we observe drastic improvements when upsampling small offline datasets and see that additional synthetic data also allows us to effectively train larger networks. Furthermore, SynthER enables online agents to train with a much higher update-to-data ratio than before, leading to a significant increase in sample efficiency, without any algorithmic changes. We believe that synthetic training data could open the door to realizing the full potential of deep learning for replay-based RL algorithms from limited data. Finally, we open-source our code at https://github.com/conglu1997/SynthER.
翻译:过去十年的一个关键主题是:当大型神经网络与大型数据集相结合时,可以产生显著的结果。在深度强化学习(RL)中,这一范式通常通过经验回放得以实现,即利用过去经验的数据集来训练策略或价值函数。然而,与监督学习或自监督学习不同,强化学习代理必须自行收集数据,而这些数据往往有限。因此,很难充分利用深度学习的优势,甚至较小的神经网络在训练初期也可能出现过拟合。在这项工作中,我们利用生成建模近期的巨大进展,提出了一种基于扩散的方法——合成经验回放(SynthER),以灵活地增加代理收集的经验样本。我们证明,SynthER是训练强化学习代理的有效方法,适用于离线和在线设置,以及本体感知和基于像素的环境。在离线设置中,我们在对小型离线数据集进行上采样时观察到显著的改进,并且额外的合成数据还使我们能够有效训练更大的网络。此外,SynthER使得在线代理能够以比以前更高的更新-数据比进行训练,从而在无需任何算法更改的情况下显著提高样本效率。我们相信,合成训练数据可能为从有限数据中实现基于回放的强化学习算法中深度学习的全部潜力打开大门。最后,我们在 https://github.com/conglu1997/SynthER 开源了我们的代码。