A key theme in the past decade has been that when large neural networks and large datasets combine they can produce remarkable results. In deep reinforcement learning (RL), this paradigm is commonly made possible through experience replay, whereby a dataset of past experiences is used to train a policy or value function. However, unlike in supervised or self-supervised learning, an RL agent has to collect its own data, which is often limited. Thus, it is challenging to reap the benefits of deep learning, and even small neural networks can overfit at the start of training. In this work, we leverage the tremendous recent progress in generative modeling and propose Synthetic Experience Replay (SynthER), a diffusion-based approach to flexibly upsample an agent's collected experience. We show that SynthER is an effective method for training RL agents across offline and online settings, in both proprioceptive and pixel-based environments. In offline settings, we observe drastic improvements when upsampling small offline datasets and see that additional synthetic data also allows us to effectively train larger networks. Furthermore, SynthER enables online agents to train with a much higher update-to-data ratio than before, leading to a significant increase in sample efficiency, without any algorithmic changes. We believe that synthetic training data could open the door to realizing the full potential of deep learning for replay-based RL algorithms from limited data. Finally, we open-source our code at https://github.com/conglu1997/SynthER.
翻译:过去十年的一个关键主题是:当大型神经网络与大型数据集相结合时,可以产生显著成果。在深度强化学习中,这一范式通常通过经验回放得以实现,即利用过去经验的数据集来训练策略或价值函数。然而,与监督学习或自监督学习不同,强化学习智能体必须自行收集数据,而这些数据往往有限。因此,要获得深度学习的益处颇具挑战,即便小型神经网络在训练初期也可能出现过拟合。在本研究中,我们利用近期生成建模领域的巨大进展,提出了一种基于扩散的方法——合成经验回放(SynthER),用于灵活地扩增智能体收集的经验。我们证明,SynthER是一种在离线与在线设置下,以及本体感知和基于像素的环境中训练强化学习智能体的有效方法。在离线设置中,我们观察到扩增小型离线数据集能带来显著改进,且额外合成数据也使我们能够有效训练更大的网络。此外,SynthER使在线智能体能够比以往更高的更新-数据比进行训练,从而在不改变算法的情况下显著提升样本效率。我们相信,合成训练数据可为在有限数据条件下实现基于回放的强化学习算法的深度学习潜力开辟道路。最后,我们在https://github.com/conglu1997/SynthER上开源了我们的代码。