We present a deep reinforcement learning (deep RL) algorithm that consists of learning-based motion planning and imitation to tackle challenging control problems. Deep RL has been an effective tool for solving many high-dimensional continuous control problems, but it cannot effectively solve challenging problems with certain properties, such as sparse reward functions or sensitive dynamics. In this work, we propose an approach that decomposes the given problem into two deep RL stages: motion planning and motion imitation. The motion planning stage seeks to compute a feasible motion plan by leveraging the powerful planning capability of deep RL. Subsequently, the motion imitation stage learns a control policy that can imitate the given motion plan with realistic sensors and actuation models. This new formulation requires only a nominal added cost to the user because both stages require minimal changes to the original problem. We demonstrate that our approach can solve challenging control problems, rocket navigation, and quadrupedal locomotion, which cannot be solved by the monolithic deep RL formulation or the version with Probabilistic Roadmap.
翻译:我们提出一种包含学习型运动规划与模仿的深度强化学习算法,用于应对挑战性控制问题。深度强化学习是解决高维连续控制问题的有效工具,但在面对稀疏奖励函数或敏感动力学等特定困难特性时效果有限。本文提出将问题分解为两个深度强化学习阶段的方法:运动规划与运动模仿。运动规划阶段通过深度强化学习的强大规划能力计算可行运动轨迹;运动模仿阶段则基于真实传感器与驱动模型学习能模仿该运动轨迹的控制策略。这种新范式对用户仅增加名义成本,因为两阶段均只需对原始问题做最小改动。实验证明,该方法可解决火箭导航、四足运动等挑战性控制问题,而传统整体式深度强化学习方法或基于概率路标图的改进版本均无法解决此类问题。