Recent rapid developments in reinforcement learning algorithms have been giving us novel possibilities in many fields. However, due to their exploring property, we have to take the risk into consideration when we apply those algorithms to safety-critical problems especially in real environments. In this study, we deal with a safe exploration problem in reinforcement learning under the existence of disturbance. We define the safety during learning as satisfaction of the constraint conditions explicitly defined in terms of the state and propose a safe exploration method that uses partial prior knowledge of a controlled object and disturbance. The proposed method assures the satisfaction of the explicit state constraints with a pre-specified probability even if the controlled object is exposed to a stochastic disturbance following a normal distribution. As theoretical results, we introduce sufficient conditions to construct conservative inputs not containing an exploring aspect used in the proposed method and prove that the safety in the above explained sense is guaranteed with the proposed method. Furthermore, we illustrate the validity and effectiveness of the proposed method through numerical simulations of an inverted pendulum and a four-bar parallel link robot manipulator.
翻译:近年来强化学习算法的快速发展,为诸多领域带来了新的可能性。然而,由于算法固有的探索特性,在将其应用于安全关键问题(尤其是真实环境下的场景)时,必须考虑潜在风险。本研究针对干扰存在条件下的强化学习安全探索问题展开探讨。我们将学习过程中的安全性定义为显式状态约束条件的满足,并提出一种利用被控对象与干扰的部分先验知识的安全探索方法。该方法能够确保:即使被控对象受到服从正态分布的随机干扰,仍能以预设概率满足显式状态约束。在理论层面,我们给出了构建不含探索成分的保守控制输入的充分条件,并证明了该方法能够保障上述意义上的安全性。此外,通过倒立摆与四连杆并联机器人机械臂的数值仿真实验,验证了所提方法的有效性与可行性。