Data augmentation (DA) is a crucial technique for enhancing the sample efficiency of visual reinforcement learning (RL) algorithms. Notably, employing simple observation transformations alone can yield outstanding performance without extra auxiliary representation tasks or pre-trained encoders. However, it remains unclear which attributes of DA account for its effectiveness in achieving sample-efficient visual RL. To investigate this issue and further explore the potential of DA, this work conducts comprehensive experiments to assess the impact of DA's attributes on its efficacy and provides the following insights and improvements: (1) For individual DA operations, we reveal that both ample spatial diversity and slight hardness are indispensable. Building on this finding, we introduce Random PadResize (Rand PR), a new DA operation that offers abundant spatial diversity with minimal hardness. (2) For multi-type DA fusion schemes, the increased DA hardness and unstable data distribution result in the current fusion schemes being unable to achieve higher sample efficiency than their corresponding individual operations. Taking the non-stationary nature of RL into account, we propose a RL-tailored multi-type DA fusion scheme called Cycling Augmentation (CycAug), which performs periodic cycles of different DA operations to increase type diversity while maintaining data distribution consistency. Extensive evaluations on the DeepMind Control suite and CARLA driving simulator demonstrate that our methods achieve superior sample efficiency compared with the prior state-of-the-art methods.
翻译:数据增强(DA)是提升视觉强化学习(RL)算法样本效率的关键技术。值得注意的是,仅采用简单的观测变换,无需额外的辅助表征任务或预训练编码器,即可取得出色性能。然而,目前尚不明确数据增强的哪些属性使其在实现样本高效的视觉强化学习中发挥作用。为探究该问题并进一步挖掘数据增强的潜力,本文通过全面实验评估数据增强属性对其效能的影响,并给出以下见解与改进:(1)对于单一数据增强操作,我们发现充足的空间多样性与适度的难度缺一不可。基于此发现,我们提出Random PadResize(Rand PR)这一新型数据增强操作,它以最低难度提供丰富的空间多样性。(2)对于多类型数据增强融合方案,增强难度的增加与数据分布的不稳定性导致现有融合方案无法比对应单一操作实现更高的样本效率。考虑到强化学习的非平稳特性,我们提出一种专为强化学习定制的多类型数据增强融合方案——循环增强(CycAug),该方法通过周期性轮换不同数据增强操作来提升类型多样性,同时保持数据分布的一致性。在DeepMind控制套件和CARLA驾驶模拟器上的大量评估表明,与先前最先进方法相比,我们的方法实现了更优的样本效率。