We introduce Deep QP Safety Filter, a fully data-driven safety layer for black-box dynamical systems. Our method learns a Quadratic-Program (QP) safety filter without model knowledge by combining Hamilton-Jacobi (HJ) reachability with model-free learning. We construct contraction-based losses for both the safety value and its derivatives, and train two neural networks accordingly. In the exact setting, the learned critic converges to the viscosity solution (and its derivative), even for non-smooth values. Across diverse dynamical systems -- even including a hybrid system -- and multiple RL tasks, Deep QP Safety Filter substantially reduces pre-convergence failures while accelerating learning toward higher returns than strong baselines, offering a principled and practical route to safe, model-free control.
翻译:本文提出深度 QP 安全过滤器,一种用于黑盒动力系统的完全数据驱动的安全层。我们的方法通过将 Hamilton-Jacobi (HJ) 可达性与无模型学习相结合,在无需模型知识的情况下学习一个二次规划 (QP) 安全过滤器。我们为安全值及其导数构建了基于收缩的损失函数,并相应地训练两个神经网络。在精确设定下,即使对于非光滑值,学习到的评价器也能收敛到粘性解(及其导数)。在多种动力系统(甚至包括混合系统)和多个强化学习任务中,深度 QP 安全过滤器显著减少了收敛前的失败,同时比强基线更快地学习到更高的回报,为安全、无模型的控制提供了一条原则性且实用的路径。