Synthesizing planning and control policies in robotics is a fundamental task, further complicated by factors such as complex logic specifications and high-dimensional robot dynamics. This paper presents a novel reinforcement learning approach to solving high-dimensional robot navigation tasks with complex logic specifications by co-learning planning and control policies. Notably, this approach significantly reduces the sample complexity in training, allowing us to train high-quality policies with much fewer samples compared to existing reinforcement learning algorithms. In addition, our methodology streamlines complex specification extraction from map images and enables the efficient generation of long-horizon robot motion paths across different map layouts. Moreover, our approach also demonstrates capabilities for high-dimensional control and avoiding suboptimal policies via policy alignment. The efficacy of our approach is demonstrated through experiments involving simulated high-dimensional quadruped robot dynamics and a real-world differential drive robot (TurtleBot3) under different types of task specifications.
翻译:机器人规划与控制策略的综合是一项基础任务,但复杂逻辑规格和高维机器人动力学等因素使其变得更加困难。本文提出了一种新颖的强化学习方法,通过联合学习规划与控制策略,解决具有复杂逻辑规格的高维机器人导航任务。值得注意的是,该方法显著降低了训练中的样本复杂度,与现有强化学习算法相比,能够以更少的样本训练出高质量策略。此外,我们的方法简化了从地图图像中提取复杂规格的过程,并能够高效生成跨越不同地图布局的长时域机器人运动路径。同时,该方法还展示了处理高维控制并通过策略对齐避免次优策略的能力。通过模拟高维四足机器人动力学和真实差速驱动机器人(TurtleBot3)在不同类型任务规格下的实验,验证了我们方法的有效性。