Real-world accidents in learning-enabled CPS frequently occur in challenging corner cases. During the training of deep reinforcement learning (DRL) policy, the standard setup for training conditions is either fixed at a single initial condition or uniformly sampled from the admissible state space. This setup often overlooks the challenging but safety-critical corner cases. To bridge this gap, this paper proposes a physics-model-guided worst-case sampling strategy for training safe policies that can handle safety-critical cases toward guaranteed safety. Furthermore, we integrate the proposed worst-case sampling strategy into the physics-regulated deep reinforcement learning (Phy-DRL) framework to build a more data-efficient and safe learning algorithm for safety-critical CPS. We validate the proposed training strategy with Phy-DRL through extensive experiments on a simulated cart-pole system, a 2D quadrotor, a simulated and a real quadruped robot, showing remarkably improved sampling efficiency to learn more robust safe policies.
翻译:学习型信息物理系统中的现实世界事故常发生于具有挑战性的极端情况。在深度强化学习策略的训练过程中,训练条件的标准设置要么固定于单一初始条件,要么从允许状态空间中均匀采样。这种设置往往忽略了具有挑战性但安全攸关的极端情况。为弥补这一差距,本文提出一种物理模型引导的最坏情况采样策略,用于训练能够处理安全攸关情况并保证安全的安全策略。此外,我们将所提出的最坏情况采样策略集成到物理调控深度强化学习框架中,从而为安全攸关的信息物理系统构建一种数据效率更高且更安全的学习算法。我们通过在小车倒立摆仿真系统、二维四旋翼飞行器、仿真四足机器人及真实四足机器人上的大量实验,验证了所提出的训练策略与物理调控深度强化学习框架的有效性,结果表明该方法能显著提升采样效率以学习更鲁棒的安全策略。