Various data augmentation techniques have been recently proposed in image-based deep reinforcement learning (DRL). Although they empirically demonstrate the effectiveness of data augmentation for improving sample efficiency or generalization, which technique should be preferred is not always clear. To tackle this question, we analyze existing methods to better understand them and to uncover how they are connected. Notably, by expressing the variance of the Q-targets and that of the empirical actor/critic losses of these methods, we can analyze the effects of their different components and compare them. We furthermore formulate an explanation about how these methods may be affected by choosing different data augmentation transformations in calculating the target Q-values. This analysis suggests recommendations on how to exploit data augmentation in a more principled way. In addition, we include a regularization term called tangent prop, previously proposed in computer vision, but whose adaptation to DRL is novel to the best of our knowledge. We evaluate our proposition and validate our analysis in several domains. Compared to different relevant baselines, we demonstrate that it achieves state-of-the-art performance in most environments and shows higher sample efficiency and better generalization ability in some complex environments.
翻译:在基于图像的深度强化学习中,近期涌现出多种数据增强技术。尽管这些技术通过实验验证了数据增强在提升样本效率或泛化能力方面的有效性,但如何优先选取合适的方案尚不明确。为解决此问题,我们通过分析现有方法以深化理解,并揭示不同技术间的内在关联。具体而言,通过数学表达这些方法的Q值目标方差与经验性演员/评论家损失方差,我们能够解析其各组成部分的影响机制并进行横向对比。进一步地,我们阐明在不同数据增强变换应用于目标Q值计算时,这些方法可能产生的差异效应。基于上述分析,我们提出更具原则性的数据增强应用建议。此外,我们引入计算机视觉领域此前提出的正则化项“切向传播”(tangent prop),但据我们所知,将其适配至深度强化学习领域尚属首次。我们在多个领域验证了所提方法与分析结果的有效性。与相关基线模型相比,我们的方法在多数环境中达到最先进性能,并在复杂环境中展现出更高的样本效率与更强的泛化能力。