SAFE-GIL: SAFEty Guided Imitation Learning for Robotic Systems

Behavior cloning (BC) is a widely-used approach in imitation learning, where a robot learns a control policy by observing an expert supervisor. However, the learned policy can make errors and might lead to safety violations, which limits their utility in safety-critical robotics applications. While prior works have tried improving a BC policy via additional real or synthetic action labels, adversarial training, or runtime filtering, none of them explicitly focus on reducing the BC policy's safety violations during training time. We propose SAFE-GIL, a design-time method to learn safety-aware behavior cloning policies. SAFE-GIL deliberately injects adversarial disturbance in the system during data collection to guide the expert towards safety-critical states. This disturbance injection simulates potential policy errors that the system might encounter during the test time. By ensuring that training more closely replicates expert behavior in safety-critical states, our approach results in safer policies despite policy errors during the test time. We further develop a reachability-based method to compute this adversarial disturbance. We compare SAFE-GIL with various behavior cloning techniques and online safety-filtering methods in three domains: autonomous ground navigation, aircraft taxiing, and aerial navigation on a quadrotor testbed. Our method demonstrates a significant reduction in safety failures, particularly in low data regimes where the likelihood of learning errors, and therefore safety violations, is higher. See our website here: https://y-u-c.github.io/safegil/

翻译：行为克隆（Behavior Cloning, BC）是模仿学习中一种广泛使用的方法，机器人通过观察专家监督者来学习控制策略。然而，学习到的策略可能出错并导致违反安全性，这限制了其在安全关键机器人应用中的实用性。虽然先前的研究尝试通过额外的真实或合成动作标签、对抗性训练或运行时过滤来改进BC策略，但都没有明确地专注于在训练期间减少BC策略的安全违规行为。我们提出了SAFE-GIL，一种在设计时学习具有安全意识的BC策略的方法。SAFE-GIL在数据收集期间，有意向系统中注入对抗性扰动，以引导专家进入安全关键状态。这种扰动注入模拟了系统在测试时可能遇到的潜在策略错误。通过确保训练更紧密地复现专家在安全关键状态下的行为，我们的方法能够在测试时即使出现策略错误，也能产生更安全的策略。我们进一步开发了一种基于可达性的方法来计算这种对抗性扰动。我们在三个领域中比较了SAFE-GIL与各种行为克隆技术以及在线安全过滤方法：自主地面导航、飞机滑行以及四旋翼测试平台上的空中导航。我们的方法显著减少了安全故障，特别是在数据量较少的场景中，此时学习错误（以及随之而来的安全违规）的可能性更高。请访问我们的网站：https://y-u-c.github.io/safegil/