An outstanding challenge for the widespread deployment of robotic systems like autonomous vehicles is ensuring safe interaction with humans without sacrificing performance. Existing safety methods often neglect the robot's ability to learn and adapt at runtime, leading to overly conservative behavior. This paper proposes a new closed-loop paradigm for synthesizing safe control policies that explicitly account for the robot's evolving uncertainty and its ability to quickly respond to future scenarios as they arise, by jointly considering the physical dynamics and the robot's learning algorithm. We leverage adversarial reinforcement learning for tractable safety analysis under high-dimensional learning dynamics and demonstrate our framework's ability to work with both Bayesian belief propagation and implicit learning through large pre-trained neural trajectory predictors.
翻译:自主驾驶汽车等机器人系统的广泛部署面临一项突出挑战:在保障与人类安全交互的同时不牺牲性能。现有安全方法往往忽略了机器人运行时学习和适应的能力,导致行为过于保守。本文提出一种新的闭环范式,用于合成安全控制策略,通过联合考虑物理动力学和机器人的学习算法,明确纳入机器人不断演化的不确定性及其对未来场景的快速响应能力。我们利用对抗性强化学习处理高维学习动力学下的可解安全分析问题,并展示了该框架与贝叶斯信念传播及通过大型预训练神经轨迹预测器实现的隐式学习的兼容能力。