Accurate estimate of long-term risk is critical for safe decision-making, but sampling from rare risk events and long-term trajectories can be prohibitively costly. Risk gradient can be used in many first-order techniques for learning and control methods, but gradient estimate is difficult to obtain using Monte Carlo (MC) methods because the infinitesimal divisor may significantly amplify sampling noise. Motivated by this gap, we propose an efficient method to evaluate long-term risk probabilities and their gradients using short-term samples without sufficient risk events. We first derive that four types of long-term risk probability are solutions of certain partial differential equations (PDEs). Then, we propose a physics-informed learning technique that integrates data and physics information (aforementioned PDEs). The physics information helps propagate information beyond available data and obtain provable generalization beyond available data, which in turn enables long-term risk to be estimated using short-term samples of safe events. Finally, we demonstrate in simulation that the proposed technique has improved sample efficiency, generalizes well to unseen regions, and adapts to changing system parameters.
翻译:精确估计长期风险对于安全决策至关重要,但从罕见风险事件和长期轨迹中采样可能成本极高。风险梯度可应用于许多一阶学习与控制方法,但使用蒙特卡洛(MC)方法难以获得梯度估计,因为无穷小除数可能显著放大采样噪声。受此局限启发,我们提出一种高效方法,利用不含充分风险事件的短期样本评估长期风险概率及其梯度。我们首先推导出四类长期风险概率是特定偏微分方程(PDE)的解。随后,提出一种融合数据与物理信息(即前述PDE)的物理信息学习技术。物理信息有助于在可用数据之外传播信息,并获得超越可用数据的可证明泛化能力,从而使得利用安全事件的短期样本估计长期风险成为可能。最后,通过仿真验证了所提技术具有更高的样本效率、对未观测区域的良好泛化能力,并能适应变化的系统参数。