Statistical Reachability Analysis of Stochastic Cyber-Physical Systems under Distribution Shift

Reachability analysis is a popular method to give safety guarantees for stochastic cyber-physical systems (SCPSs) that takes in a symbolic description of the system dynamics and uses set-propagation methods to compute an overapproximation of the set of reachable states over a bounded time horizon. In this paper, we investigate the problem of performing reachability analysis for an SCPS that does not have a symbolic description of the dynamics, but instead is described using a digital twin model that can be simulated to generate system trajectories. An important challenge is that the simulator implicitly models a probability distribution over the set of trajectories of the SCPS; however, it is typical to have a sim2real gap, i.e., the actual distribution of the trajectories in a deployment setting may be shifted from the distribution assumed by the simulator. We thus propose a statistical reachability analysis technique that, given a user-provided threshold $1-\epsilon$, provides a set that guarantees that any reachable state during deployment lies in this set with probability not smaller than this threshold. Our method is based on three main steps: (1) learning a deterministic surrogate model from sampled trajectories, (2) conducting reachability analysis over the surrogate model, and (3) employing {\em robust conformal inference} using an additional set of sampled trajectories to quantify the surrogate model's distribution shift with respect to the deployed SCPS. To counter conservatism in reachable sets, we propose a novel method to train surrogate models that minimizes a quantile loss term (instead of the usual mean squared loss), and a new method that provides tighter guarantees using conformal inference using a normalized surrogate error. We demonstrate the effectiveness of our technique on various case studies.

翻译：可达性分析是一种为随机信息物理系统提供安全性保证的常用方法，该方法通过输入系统动力学的符号化描述，并利用集合传播技术计算有界时间范围内可达状态集的超近似。本文研究针对一类特殊随机信息物理系统的可达性分析问题：该系统没有动力学的符号化描述，而是通过可模拟生成系统轨迹的数字孪生模型进行刻画。一个关键挑战在于，仿真器隐式地对随机信息物理系统的轨迹集合建立了概率分布模型；然而通常存在仿真与现实之间的差距，即实际部署环境中的轨迹分布可能偏离仿真器假设的分布。为此，我们提出一种统计可达性分析技术：在给定用户设定阈值 $1-\epsilon$ 的条件下，该方法能够提供一个集合，并保证在部署过程中任何可达状态以不低于该阈值的概率落在此集合内。我们的方法基于三个核心步骤：(1) 从采样轨迹中学习确定性代理模型，(2) 对代理模型进行可达性分析，(3) 利用额外采样轨迹集进行鲁棒保形推断，以量化代理模型相对于部署随机信息物理系统的分布偏移。为降低可达集的保守性，我们提出一种训练代理模型的新方法：通过最小化分位数损失项（替代传统的均方损失），并设计了一种基于归一化代理误差的保形推断方法以获得更紧致的保证界限。我们在多个案例研究中验证了所提技术的有效性。