Reinforcement learning (RL) is recognized as lacking generalization and robustness under environmental perturbations, which excessively restricts its application for real-world robotics. Prior work claimed that adding regularization to the value function is equivalent to learning a robust policy with uncertain transitions. Although the regularization-robustness transformation is appealing for its simplicity and efficiency, it is still lacking in continuous control tasks. In this paper, we propose a new regularizer named $\textbf{U}$ncertainty $\textbf{S}$et $\textbf{R}$egularizer (USR), by formulating the uncertainty set on the parameter space of the transition function. In particular, USR is flexible enough to be plugged into any existing RL framework. To deal with unknown uncertainty sets, we further propose a novel adversarial approach to generate them based on the value function. We evaluate USR on the Real-world Reinforcement Learning (RWRL) benchmark, demonstrating improvements in the robust performance for perturbed testing environments.
翻译:强化学习(RL)在环境扰动下缺乏泛化性与鲁棒性,这严重限制了其在真实世界机器人领域的应用。先前研究表明,在值函数中添加正则化等价于学习具有不确定转移的鲁棒策略。尽管这种正则化-鲁棒性转化因其简洁性和高效性而备受关注,但在连续控制任务中仍存在不足。本文提出一种新的正则化器——$\textbf{不确定性集合正则化器}$(USR),通过在转移函数的参数空间上构建不确定性集合实现鲁棒学习。特别地,USR具有高度灵活性,可嵌入现有任意强化学习框架。针对未知的不确定性集合,我们进一步提出基于值函数的对抗生成方法。在真实世界强化学习(RWRL)基准上的评估表明,该方法在扰动测试环境下显著提升了鲁棒性能。