Deployment in hazardous environments requires robots to understand the risks associated with their actions and movements to prevent accidents. Despite its importance, these risks are not explicitly modeled by currently deployed locomotion controllers for legged robots. In this work, we propose a risk sensitive locomotion training method employing distributional reinforcement learning to consider safety explicitly. Instead of relying on a value expectation, we estimate the complete value distribution to account for uncertainty in the robot's interaction with the environment. The value distribution is consumed by a risk metric to extract risk sensitive value estimates. These are integrated into Proximal Policy Optimization (PPO) to derive our method, Distributional Proximal Policy Optimization (DPPO). The risk preference, ranging from risk-averse to risk-seeking, can be controlled by a single parameter, which enables to adjust the robot's behavior dynamically. Importantly, our approach removes the need for additional reward function tuning to achieve risk sensitivity. We show emergent risk sensitive locomotion behavior in simulation and on the quadrupedal robot ANYmal. Videos of the experiments and code are available at https://sites.google.com/leggedrobotics.com/risk-aware-locomotion.
翻译:在危险环境中部署机器人时,需要机器人理解其动作和运动相关的风险,以防发生事故。尽管风险意识至关重要,但当前腿式机器人运动控制器在部署时并未显式建模这些风险。本研究提出一种基于分布强化学习的风险敏感运动训练方法,旨在显式考虑安全性。不同于依赖价值期望,我们通过估计完整价值分布来刻画机器人与环境交互中的不确定性。该价值分布由风险度量函数处理,以提取风险敏感的价值估计。我们将这些估计融入近端策略优化算法,进而提出本方法——分布近端策略优化。从风险规避到风险寻求的风险偏好可通过单一参数调控,从而实现机器人行为的动态调整。重要的是,该方法无需额外调整奖励函数即可实现风险敏感性。我们通过仿真实验和四足机器人ANYmal实物实验展示了涌现的风险敏感运动行为。实验视频与代码见 https://sites.google.com/leggedrobotics.com/risk-aware-locomotion。