In this paper, we present an approach for learning collision-free robot trajectories in the presence of moving obstacles. As a first step, we train a backup policy to generate evasive movements from arbitrary initial robot states using model-free reinforcement learning. When learning policies for other tasks, the backup policy can be used to estimate the potential risk of a collision and to offer an alternative action if the estimated risk is considered too high. No matter which action is selected, our action space ensures that the kinematic limits of the robot joints are not violated. We analyze and evaluate two different methods for estimating the risk of a collision. A physics simulation performed in the background is computationally expensive but provides the best results in deterministic environments. If a data-based risk estimator is used instead, the computational effort is significantly reduced, but an additional source of error is introduced. For evaluation, we successfully learn a reaching task and a basketball task while keeping the risk of collisions low. The results demonstrate the effectiveness of our approach for deterministic and stochastic environments, including a human-robot scenario and a ball environment, where no state can be considered permanently safe. By conducting experiments with a real robot, we show that our approach can generate safe trajectories in real time.
翻译:本文提出了一种在存在移动障碍物的环境中学习无碰撞机器人轨迹的方法。首先,我们通过无模型强化学习训练一个备用策略,使其能够从任意初始机器人状态生成规避动作。在学习其他任务策略时,该备用策略可用于评估潜在的碰撞风险,并在风险估计过高时提供替代动作。无论选择何种动作,我们的动作空间设计确保机器人关节的运动学极限不被违反。我们分析并评估了两种不同的碰撞风险估计方法:在后台运行的物理仿真计算成本较高,但在确定性环境中能提供最佳结果;若采用基于数据的风险估计器,则计算量显著降低,但会引入额外误差源。为验证方法有效性,我们在保持低碰撞风险的前提下,成功学习了抓取任务和篮球任务。实验结果表明,该方法在确定性及随机性环境(包括人机交互场景和球类环境——这些环境中不存在永久安全状态)中均具有良好效果。通过真实机器人实验,我们证明该方法能够实时生成安全轨迹。