Reinforcement learning (RL) is effective in many robotic applications, but it requires extensive exploration of the state-action space, during which behaviors can be unsafe. This significantly limits its applicability to large robots with complex actuators operating on unstable terrain. Hence, to design a safe goal-reaching control framework for large-scale robots, this paper decomposes the whole system into a set of tightly coupled functional modules. 1) A real-time visual pose estimation approach is employed to provide accurate robot states to 2) an RL motion planner for goal-reaching tasks that explicitly respects robot specifications. The RL module generates real-time smooth motion commands for the actuator system, independent of its underlying dynamic complexity. 3) In the actuation mechanism, a supervised deep learning model is trained to capture the complex dynamics of the robot and provide this model to 4) a model-based robust adaptive controller that guarantees the wheels track the RL motion commands even on slip-prone terrain. 5) Finally, to reduce human intervention, a mathematical safety supervisor monitors the robot, stops it on unsafe faults, and autonomously guides it back to a safe inspection area. The proposed framework guarantees uniform exponential stability of the actuation system and safety of the whole operation. Experiments on a 6,000 kg robot in different scenarios confirm the effectiveness of the proposed framework.
翻译:强化学习(RL)在众多机器人应用中表现优异,但其需要对状态-动作空间进行广泛探索,在此过程中可能产生不安全行为。这极大地限制了其在具有复杂执行机构、于不稳定地形运行的大型机器人上的适用性。因此,为设计一种适用于大型机器人的安全目标到达控制框架,本文将整个系统分解为一组紧密耦合的功能模块。1) 采用一种实时视觉位姿估计方法,为2) 一个明确遵循机器人规格的目标到达任务RL运动规划器提供精确的机器人状态。该RL模块独立于执行机构系统底层的动态复杂性,为其生成实时平滑的运动指令。3) 在执行机构机制中,训练一个监督深度学习模型以捕获机器人的复杂动力学特性,并将此模型提供给4) 一个基于模型的鲁棒自适应控制器,该控制器即使在易打滑地形上也能保证轮子跟踪RL运动指令。5) 最后,为减少人工干预,一个数学安全监督器监控机器人,在发生不安全故障时使其停止,并自主引导其返回安全巡检区域。所提框架保证了执行机构系统的一致指数稳定性以及整个操作的安全性。在一个6000公斤机器人上于不同场景中进行的实验证实了所提框架的有效性。