Reinforcement learning (RL) is recognized as lacking generalization and robustness under environmental perturbations, which excessively restricts its application for real-world robotics. Prior work claimed that adding regularization to the value function is equivalent to learning a robust policy with uncertain transitions. Although the regularization-robustness transformation is appealing for its simplicity and efficiency, it is still lacking in continuous control tasks. In this paper, we propose a new regularizer named $\textbf{U}$ncertainty $\textbf{S}$et $\textbf{R}$egularizer (USR), by formulating the uncertainty set on the parameter space of the transition function. In particular, USR is flexible enough to be plugged into any existing RL framework. To deal with unknown uncertainty sets, we further propose a novel adversarial approach to generate them based on the value function. We evaluate USR on the Real-world Reinforcement Learning (RWRL) benchmark, demonstrating improvements in the robust performance for perturbed testing environments.
翻译:强化学习(RL)在环境扰动下缺乏泛化能力和鲁棒性,这严重限制了其在真实世界机器人中的应用。先前的研究声称,对价值函数添加正则化等价于学习一个具有不确定转移的鲁棒策略。尽管这种正则化-鲁棒性转换因其简洁性和高效性备受青睐,但在连续控制任务中仍存在不足。本文提出一种名为不确定性集正则化器(Uncertainty Set Regularizer, USR)的新型正则化方法,通过在转移函数的参数空间上构建不确定性集实现其功能。特别地,USR具有高度灵活性,可嵌入任何现有强化学习框架中。为解决未知不确定性集的问题,我们进一步提出一种基于价值函数生成这些集合的对抗方法。在真实世界强化学习(RWRL)基准测试上的评估表明,该方法在扰动测试环境下显著提升了鲁棒性能。