We introduce Deep QP Safety Filter, a fully data-driven safety layer for black-box dynamical systems. Our method learns a Quadratic-Program (QP) safety filter without model knowledge by combining Hamilton-Jacobi (HJ) reachability with model-free learning. We construct contraction-based losses for both the safety value and its derivatives, and train two neural networks accordingly. In the exact setting, the learned critic converges to the viscosity solution (and its derivative), even for non-smooth values. Across diverse dynamical systems -- even including a hybrid system -- and multiple RL tasks, Deep QP Safety Filter substantially reduces pre-convergence failures while accelerating learning toward higher returns than strong baselines, offering a principled and practical route to safe, model-free control.
翻译:我们提出深度QP安全滤波器,一种针对黑箱动力系统的全数据驱动安全层。该方法通过结合哈密顿-雅可比可达性与无模型学习,在不依赖模型知识的情况下习得二次规划安全滤波器。我们构建了针对安全值及其导数的收缩损失函数,并据此训练两个神经网络。在精确设定下,学习到的评论家函数收敛至粘性解及其导数,即使对非光滑值函数亦成立。在多种动力系统(包括混合动力系统)及多个强化学习任务中,深度QP安全滤波器显著减少了预收敛阶段的故障,同时加速学习过程,使其获得优于强基线的更高回报,为安全无模型控制提供了完备且实用的解决路径。