Reinforcement learning (RL) is recognized as lacking generalization and robustness under environmental perturbations, which excessively restricts its application for real-world robotics. Prior work claimed that adding regularization to the value function is equivalent to learning a robust policy with uncertain transitions. Although the regularization-robustness transformation is appealing for its simplicity and efficiency, it is still lacking in continuous control tasks. In this paper, we propose a new regularizer named $\textbf{U}$ncertainty $\textbf{S}$et $\textbf{R}$egularizer (USR), by formulating the uncertainty set on the parameter space of the transition function. In particular, USR is flexible enough to be plugged into any existing RL framework. To deal with unknown uncertainty sets, we further propose a novel adversarial approach to generate them based on the value function. We evaluate USR on the Real-world Reinforcement Learning (RWRL) benchmark, demonstrating improvements in the robust performance for perturbed testing environments.
翻译:强化学习在环境扰动下缺乏泛化性与鲁棒性,这严重限制了其在真实机器人领域的应用。先前研究指出,对价值函数施加正则化等价于学习具有不确定转移的鲁棒策略。尽管这种正则化-鲁棒性转换因简洁高效而具有吸引力,但在连续控制任务中仍存在不足。本文通过在转移函数的参数空间上构建不确定性集合,提出一种新型正则化器——不确定性集合正则化器(USR)。特别地,USR具有足够灵活性,可无缝嵌入任何现有强化学习框架。针对未知不确定性集合,我们进一步提出基于价值函数生成的对抗性方法。在真实世界强化学习(RWRL)基准测试上的评估表明,该方法能有效提升扰动测试环境下的鲁棒性能。