Data augmentation is a widely used technique for improving model performance in machine learning, particularly in computer vision and natural language processing. Recently, there has been increasing interest in applying augmentation techniques to reinforcement learning (RL) problems, with a focus on image-based augmentation. In this paper, we explore a set of generic wrappers designed to augment RL environments with noise and encourage agent exploration and improve training data diversity which are applicable to a broad spectrum of RL algorithms and environments. Specifically, we concentrate on augmentations concerning states, rewards, and transition dynamics and introduce two novel augmentation techniques. In addition, we introduce a noise rate hyperparameter for control over the frequency of noise injection. We present experimental results on the impact of these wrappers on return using three popular RL algorithms, Soft Actor-Critic (SAC), Twin Delayed DDPG (TD3), and Proximal Policy Optimization (PPO), across five MuJoCo environments. To support the choice of augmentation technique in practice, we also present analysis that explores the performance these techniques across environments. Lastly, we publish the wrappers in our noisyenv repository for use with gym environments.
翻译:数据增强是机器学习中广泛用于提升模型性能的技术,尤其在计算机视觉和自然语言处理领域。近年来,将增强技术应用于强化学习(RL)问题(尤其是基于图像的增强)引起了越来越多的关注。在本文中,我们探索了一组通用封装器,旨在通过噪声增强RL环境,促进智能体探索并提高训练数据多样性,这些封装器适用于广泛的RL算法和环境。具体而言,我们重点关注涉及状态、奖励和转移动力学的增强,并引入两种新颖的增强技术。此外,我们引入一个噪声率超参数来控制噪声注入频率。我们通过三种流行的RL算法——软演员-评论家(SAC)、双延迟DDPG(TD3)和近端策略优化(PPO)——在五个MuJoCo环境上展示了这些封装器对回报影响的实验结果。为支持实践中增强技术的选择,我们还提供了跨环境性能分析。最后,我们将这些封装器发布在noisyenv仓库中,以便与gym环境一起使用。