For flexible yet safe imitation learning (IL), we propose theory and a modular method, with a safety layer that enables a closed-form probability density/gradient of the safe generative continuous policy, end-to-end generative adversarial training, and worst-case safety guarantees. The safety layer maps all actions into a set of safe actions, and uses the change-of-variables formula plus additivity of measures for the density. The set of safe actions is inferred by first checking safety of a finite sample of actions via adversarial reachability analysis of fallback maneuvers, and then concluding on the safety of these actions' neighborhoods using, e.g., Lipschitz continuity. We provide theoretical analysis showing the robustness advantage of using the safety layer already during training (imitation error linear in the horizon) compared to only using it at test time (up to quadratic error). In an experiment on real-world driver interaction data, we empirically demonstrate tractability, safety and imitation performance of our approach.
翻译:为了实现灵活且安全的模仿学习(IL),我们提出了一种理论框架和模块化方法,其中包含一个安全层,该层能够实现安全生成式连续策略的闭式概率密度/梯度、端到端生成对抗训练以及最坏情况下的安全保证。该安全层将所有动作映射到一组安全动作中,并利用变量替换公式及测度的可加性来计算其密度。安全动作集通过以下方式推断:首先通过后退机动(fallback maneuvers)的对抗可达性分析来检查有限样本动作的安全性,然后利用(例如)Lipschitz连续性推断这些动作邻域的安全性。我们的理论分析表明:与仅在测试时使用安全层(误差可达二次量级)相比,在训练期间使用该层(模仿误差与决策时段长度呈线性关系)具有鲁棒性优势。在真实驾驶员交互数据的实验中,我们经验性地验证了该方法的可操作性、安全性及模仿性能。