We present EigenSafe, an operator-theoretic framework for safety assessment of learning-enabled stochastic systems. In many robotic applications, the dynamics are inherently stochastic due to factors such as sensing noise and environmental disturbances, and it is challenging for conventional methods such as Hamilton-Jacobi reachability and control barrier functions to provide a well-calibrated safety critic that is tied to the actual safety probability. We derive a linear operator that governs the dynamic programming principle for safety probability, and find that its dominant eigenpair provides critical safety information for both individual state-action pairs and the overall closed-loop system. The proposed framework learns this dominant eigenpair, which can be used to either inform or constrain policy updates. We demonstrate that the learned eigenpair effectively facilitates safe reinforcement learning. Further, we validate its applicability in enhancing the safety of learned policies from imitation learning through robot manipulation experiments using a UR3 robotic arm in a food preparation task.
翻译:本文提出EigenSafe,一种用于学习型随机系统安全评估的算子理论框架。在许多机器人应用中,由于感知噪声和环境干扰等因素,系统动力学本质上是随机的,传统方法(如Hamilton-Jacobi可达性分析和控制屏障函数)难以提供与实际安全概率相校准的安全评估器。我们推导了一个支配安全概率动态规划原理的线性算子,并发现其主导特征对能为单个状态-动作对及整体闭环系统提供关键安全信息。所提出的框架通过学习该主导特征对,可用于指导或约束策略更新。我们证明,学习得到的特征对能有效促进安全强化学习。此外,通过使用UR3机械臂在食品制备任务中进行机器人操作实验,我们验证了该框架在提升模仿学习所得策略安全性方面的适用性。