Q-learning algorithms are appealing for real-world applications due to their data-efficiency, but they are very prone to overfitting and training instabilities when trained from visual observations. Prior work, namely SVEA, finds that selective application of data augmentation can improve the visual generalization of RL agents without destabilizing training. We revisit its recipe for data augmentation, and find an assumption that limits its effectiveness to augmentations of a photometric nature. Addressing these limitations, we propose a generalized recipe, SADA, that works with wider varieties of augmentations. We benchmark its effectiveness on DMC-GB2 - our proposed extension of the popular DMControl Generalization Benchmark - as well as tasks from Meta-World and the Distracting Control Suite, and find that our method, SADA, greatly improves training stability and generalization of RL agents across a diverse set of augmentations. For visualizations, code and benchmark: see https://aalmuzairee.github.io/SADA/
翻译:Q学习算法因其数据效率而在实际应用中颇具吸引力,但当从视觉观测进行训练时,它们极易出现过拟合和训练不稳定的问题。先前的工作,即SVEA,发现选择性应用数据增强可以在不破坏训练稳定性的前提下,提升强化学习智能体的视觉泛化能力。我们重新审视了其数据增强的配方,发现其中存在一个假设,该假设将其有效性限制在了光度性质的数据增强范围内。针对这些局限性,我们提出了一种广义的配方SADA,它能与更多种类的数据增强方法协同工作。我们在DMC-GB2(我们提出的对流行基准DMControl泛化基准的扩展)以及来自Meta-World和Distracting Control Suite的任务上评估了其有效性,发现我们的方法SADA极大地提升了强化学习智能体在多种数据增强下的训练稳定性和泛化能力。可视化、代码及基准测试信息请访问:https://aalmuzairee.github.io/SADA/