Coverage path planning is the problem of finding the shortest path that covers the entire free space of a given confined area, with applications ranging from robotic lawn mowing and vacuum cleaning, to demining and search-and-rescue tasks. While offline methods can find provably complete, and in some cases optimal, paths for known environments, their value is limited in online scenarios where the environment is not known beforehand, especially in the presence of non-static obstacles. We propose an end-to-end reinforcement learning-based approach in continuous state and action space, for the online coverage path planning problem that can handle unknown environments. We construct the observation space from both global maps and local sensory inputs, allowing the agent to plan a long-term path, and simultaneously act on short-term obstacle detections. To account for large-scale environments, we propose to use a multi-scale map input representation. Furthermore, we propose a novel total variation reward term for eliminating thin strips of uncovered space in the learned path. To validate the effectiveness of our approach, we perform extensive experiments in simulation with a distance sensor, surpassing the performance of a recent reinforcement learning-based approach.
翻译:覆盖路径规划问题旨在寻找能够覆盖给定受限区域内全部自由空间的最短路径,其应用场景包括机器人割草、真空吸尘、排雷以及搜救任务等。传统离线方法能够在已知环境中生成可证明完备且在某些情况下最优的路径,但在在线场景中(特别是存在非静态障碍物时),由于环境事先未知,这些方法的实用性受到限制。我们提出了一种基于端到端强化学习的连续状态-动作空间方法,用于解决可处理未知环境的在线覆盖路径规划问题。通过将全局地图与局部传感器输入共同构建观测空间,智能体得以规划长期路径,同时针对短期障碍物检测做出即时反应。针对大规模环境,我们提出采用多尺度地图输入表征。此外,我们设计了一项新颖的全变分奖励项,用于消除习得路径中未被覆盖的细条形区域。为验证方法的有效性,我们在配备距离传感器的仿真环境中开展了大量实验,其性能超越了近期基于强化学习的方法。