Large-scale self-supervised models have recently revolutionized our ability to perform a variety of tasks within the vision and language domains. However, using such models for autonomous systems is challenging because of safety requirements: besides executing correct actions, an autonomous agent must also avoid the high cost and potentially fatal critical mistakes. Traditionally, self-supervised training mainly focuses on imitating previously observed behaviors, and the training demonstrations carry no notion of which behaviors should be explicitly avoided. In this work, we propose Control Barrier Transformer (ConBaT), an approach that learns safe behaviors from demonstrations in a self-supervised fashion. ConBaT is inspired by the concept of control barrier functions in control theory and uses a causal transformer that learns to predict safe robot actions autoregressively using a critic that requires minimal safety data labeling. During deployment, we employ a lightweight online optimization to find actions that ensure future states lie within the learned safe set. We apply our approach to different simulated control tasks and show that our method results in safer control policies compared to other classical and learning-based methods such as imitation learning, reinforcement learning, and model predictive control.
翻译:大规模自监督模型近年来彻底改变了我们在视觉和语言领域执行多种任务的能力。然而,将此类模型应用于自主系统面临安全要求的挑战:自主智能体不仅要执行正确的动作,还必须避免高代价且可能致命的重大错误。传统上,自监督训练主要侧重于模仿先前观察到的行为,而训练示范并未明确指示哪些行为应被规避。本文提出控制屏障Transformer(ConBaT),一种以自监督方式从示范中学习安全行为的方法。ConBaT受控制理论中控制屏障函数概念的启发,采用因果Transformer通过最小化安全数据标注的判别器自回归地学习预测安全机器人动作。在部署阶段,我们利用轻量级在线优化寻找能确保未来状态处于学习的安全集内的动作。我们将该方法应用于不同的模拟控制任务,结果表明与传统方法(如模仿学习、强化学习和模型预测控制)相比,我们的方法能生成更安全的控制策略。