Accurate estimates of long-term risk probabilities and their gradients are critical for many stochastic safe control methods. However, computing such risk probabilities in real-time and in unseen or changing environments is challenging. Monte Carlo (MC) methods cannot accurately evaluate the probabilities and their gradients as an infinitesimal devisor can amplify the sampling noise. In this paper, we develop an efficient method to evaluate the probabilities of long-term risk and their gradients. The proposed method exploits the fact that long-term risk probability satisfies certain partial differential equations (PDEs), which characterize the neighboring relations between the probabilities, to integrate MC methods and physics-informed neural networks. We provide theoretical guarantees of the estimation error given certain choices of training configurations. Numerical results show the proposed method has better sample efficiency, generalizes well to unseen regions, and can adapt to systems with changing parameters. The proposed method can also accurately estimate the gradients of risk probabilities, which enables first- and second-order techniques on risk probabilities to be used for learning and control.
翻译:准确估计长期风险概率及其梯度对于许多随机安全控制方法至关重要。然而,在实时、未知或变化的环境中计算此类风险概率具有挑战性。蒙特卡洛(MC)方法无法准确评估概率及其梯度,因为无穷小除数可能会放大采样噪声。本文提出了一种高效方法,用于评估长期风险概率及其梯度。该方法利用长期风险概率满足特定偏微分方程(PDE)这一事实,这些方程表征了概率之间的邻近关系,从而将MC方法与物理信息神经网络相结合。我们针对特定训练配置下的估计误差提供了理论保证。数值结果表明,所提方法具有更好的样本效率,能良好泛化到未知区域,并适应参数变化的系统。该方法还能准确估计风险概率的梯度,从而支持基于风险概率的一阶和二阶技术在学习和控制中的应用。