Ensuring the safe operation of legged robots in uncertain, novel environments is crucial to their widespread adoption. Despite recent advances in safety filters that can keep arbitrary task-driven policies from incurring safety failures, existing solutions for legged robot locomotion still rely on simplified dynamics and may fail when the robot is perturbed away from predefined stable gaits. This paper presents a general approach that leverages offline game-theoretic reinforcement learning to synthesize a highly robust safety filter for high-order nonlinear dynamics. This gameplay filter then maintains runtime safety by continually simulating adversarial futures and precluding task-driven actions that would cause it to lose future games (and thereby violate safety). Validated on a 36-dimensional quadruped robot locomotion task, the gameplay safety filter exhibits inherent robustness to the sim-to-real gap without manual tuning or heuristic designs. Physical experiments demonstrate the effectiveness of the gameplay safety filter under perturbations, such as tugging and unmodeled irregular terrains, while simulation studies shed light on how to trade off computation and conservativeness without compromising safety.
翻译:确保腿式机器人在不确定、新颖环境中的安全运行对其广泛应用至关重要。尽管近期在安全过滤器方面取得了进展,这些过滤器能够防止任意任务驱动策略引发安全事故,但现有的腿式机器人运动解决方案仍依赖于简化的动力学模型,当机器人受到干扰偏离预定义的稳定步态时可能失效。本文提出一种通用方法,利用离线博弈论强化学习为高阶非线性动力学合成高度鲁棒的安全过滤器。该游戏玩法过滤器通过持续模拟对抗性未来并排除会导致其输掉未来博弈(从而违反安全性)的任务驱动动作,来维持运行时安全。在36维四足机器人运动任务上的验证表明,游戏玩法安全过滤器对仿真到现实的差距表现出固有的鲁棒性,无需手动调整或启发式设计。物理实验证明了游戏玩法安全过滤器在拖拽和未建模不规则地形等扰动下的有效性,而仿真研究则揭示了如何在保证安全的前提下权衡计算量与保守性。