As autonomous systems become more ubiquitous in daily life, ensuring high performance with guaranteed safety is crucial. However, safety and performance could be competing objectives, which makes their co-optimization difficult. Learning-based methods, such as Constrained Reinforcement Learning (CRL), achieve strong performance but lack formal safety guarantees due to safety being enforced as soft constraints, limiting their use in safety-critical settings. Conversely, formal methods such as Hamilton-Jacobi (HJ) Reachability Analysis and Control Barrier Functions (CBFs) provide rigorous safety assurances but often neglect performance, resulting in overly conservative controllers. To bridge this gap, we formulate the co-optimization of safety and performance as a state-constrained optimal control problem, where performance objectives are encoded via a cost function and safety requirements are imposed as state constraints. We demonstrate that the resultant value function satisfies a Hamilton-Jacobi-Bellman (HJB) equation, which we approximate efficiently using a novel physics-informed machine learning framework. In addition, we introduce a conformal prediction-based verification strategy to quantify the learning errors, recovering a high-confidence safety value function, along with a probabilistic error bound on performance degradation. Through several case studies, we demonstrate the efficacy of the proposed framework in enabling scalable learning of safe and performant controllers for complex, high-dimensional autonomous systems.
翻译:随着自主系统在日常生活中日益普及,确保其高性能与可验证的安全性变得至关重要。然而,安全性与性能目标可能存在冲突,使得二者的协同优化变得困难。基于学习的方法,如约束强化学习(CRL),虽能实现较强的性能,但由于安全仅作为软约束实施,缺乏形式化的安全保证,限制了其在安全关键场景中的应用。反之,形式化方法,如 Hamilton-Jacobi(HJ)可达性分析与控制屏障函数(CBFs),提供了严格的安全保证,但往往忽略性能,导致控制器过于保守。为弥合这一差距,我们将安全与性能的协同优化表述为一个状态约束的最优控制问题,其中性能目标通过代价函数编码,安全要求则作为状态约束施加。我们证明了所得值函数满足 Hamilton-Jacobi-Bellman(HJB)方程,并采用一种新颖的基于物理信息的机器学习框架对其进行高效近似。此外,我们引入了一种基于共形预测的验证策略,以量化学习误差,从而恢复一个高置信度的安全值函数,并给出性能退化的概率误差界。通过多个案例研究,我们证明了所提框架在实现复杂、高维自主系统的安全与高性能控制器的可扩展学习方面的有效性。