Reinforcement Learning (RL) has proven largely effective in obtaining stable locomotion gaits for legged robots. However, designing control algorithms which can robustly navigate unseen environments with obstacles remains an ongoing problem within quadruped locomotion. To tackle this, it is convenient to solve navigation tasks by means of a hierarchical approach with a low-level locomotion policy and a high-level navigation policy. Crucially, the high-level policy needs to be robust to dynamic obstacles along the path of the agent. In this work, we propose a novel way to endow navigation policies with robustness by a training process that models obstacles as adversarial agents, following the adversarial RL paradigm. Importantly, to improve the reliability of the training process, we bound the rationality of the adversarial agent resorting to quantal response equilibria, and place a curriculum over its rationality. We called this method Hierarchical policies via Quantal response Adversarial Reinforcement Learning (Hi-QARL). We demonstrate the robustness of our method by benchmarking it in unseen randomized mazes with multiple obstacles. To prove its applicability in real scenarios, our method is applied on a Unitree GO1 robot in simulation.
翻译:强化学习(Reinforcement Learning, RL)已被证明在获取腿式机器人稳定步态方面非常有效。然而,设计能够鲁棒地在包含障碍物的未知环境中导航的控制算法,仍然是四足机器人运动领域一个持续存在的问题。为解决此问题,采用分层方法处理导航任务较为便利,即包含底层运动策略与高层导航策略。关键在于,高层策略需要对其路径上的动态障碍物具有鲁棒性。在本工作中,我们提出了一种新颖的方法,通过遵循对抗强化学习范式、将障碍物建模为对抗智能体的训练过程,来赋予导航策略以鲁棒性。重要的是,为提高训练过程的可靠性,我们借助量子响应均衡来限制对抗智能体的理性程度,并对其理性水平设置课程学习。我们将此方法称为基于量子响应对抗强化学习的分层策略(Hierarchical policies via Quantal response Adversarial Reinforcement Learning, Hi-QARL)。我们在具有多个障碍物的未知随机迷宫中对该方法进行基准测试,证明了其鲁棒性。为验证其在真实场景中的适用性,我们在仿真中将该方法应用于Unitree GO1机器人。