Reinforcement learning (RL) is an area of significant research interest, and safe RL in particular is attracting attention due to its ability to handle safety-driven constraints that are crucial for real-world applications of RL algorithms. This work proposes a novel approach to RL training, called control invariant set (CIS) enhanced RL, which leverages the benefits of CIS to improve stability guarantees and sampling efficiency. The approach consists of two learning stages: offline and online. In the offline stage, CIS is incorporated into the reward design, initial state sampling, and state reset procedures. In the online stage, RL is retrained whenever the state is outside of CIS, which serves as a stability criterion. A backup table that utilizes the explicit form of CIS is obtained to ensure the online stability. To evaluate the proposed approach, we apply it to a simulated chemical reactor. The results show a significant improvement in sampling efficiency during offline training and closed-loop stability in the online implementation.
翻译:强化学习(RL)是一个重要的研究领域,其中安全强化学习因其能够处理对RL算法实际应用至关重要的安全驱动约束而备受关注。本文提出了一种新的RL训练方法,称为控制不变集(CIS)增强的强化学习,该方法利用CIS的优势来改进稳定性保证和采样效率。该方法包含两个学习阶段:离线阶段和在线阶段。在离线阶段,CIS被整合到奖励设计、初始状态采样和状态重置过程中。在在线阶段,每当状态超出CIS(作为稳定性判据)时,RL会被重新训练。利用CIS的显式形式获得的备份表用于确保在线稳定性。为评估所提方法,我们将其应用于一个模拟化学反应器。结果显示,在离线训练期间采样效率显著提升,在线实施中闭环稳定性也得到改善。