Ensuring the safe operation of legged robots in uncertain, novel environments is crucial to their widespread adoption. Despite recent advances in safety filters that can keep arbitrary task-driven policies from incurring safety failures, existing solutions for legged robot locomotion still rely on simplified dynamics and may fail when the robot is perturbed away from predefined stable gaits. This paper presents a general approach that leverages offline game-theoretic reinforcement learning to synthesize a highly robust safety filter for high-order nonlinear dynamics. This gameplay filter then maintains runtime safety by continually simulating adversarial futures and precluding task-driven actions that would cause it to lose future games (and thereby violate safety). Validated on a 36-dimensional quadruped robot locomotion task, the gameplay safety filter exhibits inherent robustness to the sim-to-real gap without manual tuning or heuristic designs. Physical experiments demonstrate the effectiveness of the gameplay safety filter under perturbations, such as tugging and unmodeled irregular terrains, while simulation studies shed light on how to trade off computation and conservativeness without compromising safety.
翻译:确保腿式机器人在不确定的新环境中安全运行对其广泛部署至关重要。尽管近期安全过滤器的进展能够防止任意任务驱动策略引发安全事故,但现有腿式机器人运动解决方案仍依赖简化动力学模型,且当机器人偏离预设稳定步态时可能失效。本文提出一种通用方法,利用离线博弈论强化学习为高阶非线性动力学系统合成高鲁棒性安全过滤器。该博弈论过滤器通过持续模拟对抗性未来场景,并在任务驱动动作可能导致后续博弈失败(进而违反安全约束)时予以阻止,从而维持运行时安全。在36维四足机器人行走任务上的验证表明,该博弈论安全过滤器无需手动调参或启发式设计,即可展现对仿真到现实差距的内在鲁棒性。物理实验验证了其在拖拽干扰、非结构化不规则地形等扰动下的有效性,同时仿真研究揭示了在不牺牲安全性的前提下平衡计算量与保守性的可行方案。