Reinforcement learning (RL) has demonstrated impressive performance in various areas such as video games and robotics. However, ensuring safety and stability, which are two critical properties from a control perspective, remains a significant challenge when using RL to control real-world systems. In this paper, we first provide definitions of safety and stability for the RL system, and then combine the control barrier function (CBF) and control Lyapunov function (CLF) methods with the actor-critic method in RL to propose a Barrier-Lyapunov Actor-Critic (BLAC) framework which helps maintain the aforementioned safety and stability for the system. In this framework, CBF constraints for safety and CLF constraint for stability are constructed based on the data sampled from the replay buffer, and the augmented Lagrangian method is used to update the parameters of the RL-based controller. Furthermore, an additional backup controller is introduced in case the RL-based controller cannot provide valid control signals when safety and stability constraints cannot be satisfied simultaneously. Simulation results show that this framework yields a controller that can help the system approach the desired state and cause fewer violations of safety constraints compared to baseline algorithms.
翻译:强化学习在电子游戏与机器人等领域已展现出显著性能。然而,从控制理论视角审视,安全性与稳定性作为两大核心属性,在运用强化学习控制实际系统时仍面临重大挑战。本文首先给出强化学习系统的安全性与稳定性定义,进而将控制势垒函数与控制李雅普诺夫函数方法同强化学习中的演员-评论家方法相结合,提出势垒-李雅普诺夫演员-评论家框架,以维护系统的上述安全与稳定特性。该框架基于回放缓冲区采样数据构建安全性的势垒函数约束与稳定性的李雅普诺夫函数约束,并采用增广拉格朗日方法更新基于强化学习的控制器参数。此外,当安全性与稳定性约束无法同时满足而导致基于强化学习的控制器无法提供有效控制信号时,引入额外备用控制器。仿真结果表明,该框架生成的控制器能辅助系统趋近期望状态,且相较于基线算法,其违反安全性约束的次数更少。