The robustness of legged locomotion is crucial for quadrupedal robots in challenging terrains. Recently, Reinforcement Learning (RL) has shown promising results in legged locomotion and various methods try to integrate privileged distillation, scene modeling, and external sensors to improve the generalization and robustness of locomotion policies. However, these methods are hard to handle uncertain scenarios such as abrupt terrain changes or unexpected external forces. In this paper, we consider a novel risk-sensitive perspective to enhance the robustness of legged locomotion. Specifically, we employ a distributional value function learned by quantile regression to model the aleatoric uncertainty of environments, and perform risk-averse policy learning by optimizing the worst-case scenarios via a risk distortion measure. Extensive experiments in both simulation environments and a real Aliengo robot demonstrate that our method is efficient in handling various external disturbances, and the resulting policy exhibits improved robustness in harsh and uncertain situations in legged locomotion. Videos are available at https://risk-averse-locomotion.github.io/.
翻译:四足机器人在复杂地形中的运动鲁棒性至关重要。近期,强化学习在四足运动领域取得了显著成果,多种方法尝试融合特权知识蒸馏、场景建模和外部传感器以提升运动策略的泛化能力与鲁棒性。然而,这些方法难以应对突发地形变化或意外外力等不确定场景。本文提出一种新颖的风险敏感视角来增强四足运动的鲁棒性。具体而言,我们采用通过分位数回归学习到的分布价值函数来建模环境的偶然不确定性,并通过风险扭曲测度优化最坏情况场景以实现风险规避策略学习。在仿真环境与真实Aliengo机器人上的大量实验表明,该方法能有效处理各类外部扰动,所得策略在四足运动的恶劣与不确定情境中展现出更强的鲁棒性。视频演示见 https://risk-averse-locomotion.github.io/。