Reinforcement learning (RL) exhibits impressive performance when managing complicated control tasks for robots. However, its wide application to physical robots is limited by the absence of strong safety guarantees. To overcome this challenge, this paper explores the control Lyapunov barrier function (CLBF) to analyze the safety and reachability solely based on data without explicitly employing a dynamic model. We also proposed the Lyapunov barrier actor-critic (LBAC), a model-free RL algorithm, to search for a controller that satisfies the data-based approximation of the safety and reachability conditions. The proposed approach is demonstrated through simulation and real-world robot control experiments, i.e., a 2D quadrotor navigation task. The experimental findings reveal this approach's effectiveness in reachability and safety, surpassing other model-free RL methods.
翻译:强化学习(RL)在管理机器人复杂控制任务时展现出卓越性能。然而,由于缺乏强安全性保证,其在实际机器人中的广泛应用受到限制。为应对这一挑战,本文研究了控制李雅普诺夫障碍函数(CLBF),该函数仅基于数据即可分析安全性和可达性,而无需显式使用动力学模型。我们还提出了李雅普诺夫障碍演员-评论家(LBAC)——一种无模型强化学习算法,用于搜索满足基于数据的安全性与可达性条件近似值的控制器。通过仿真和真实世界机器人控制实验(即二维四旋翼导航任务)验证了所提方法的有效性。实验结果表明,该方法在可达性和安全性方面优于其他无模型强化学习方法。