Coverage path planning (CPP) is a critical problem in robotics, where the goal is to find an efficient path that covers every point in an area of interest. This work addresses the power-constrained CPP problem with recharge for battery-limited unmanned aerial vehicles (UAVs). In this problem, a notable challenge emerges from integrating recharge journeys into the overall coverage strategy, highlighting the intricate task of making strategic, long-term decisions. We propose a novel proximal policy optimization (PPO)-based deep reinforcement learning (DRL) approach with map-based observations, utilizing action masking and discount factor scheduling to optimize coverage trajectories over the entire mission horizon. We further provide the agent with a position history to handle emergent state loops caused by the recharge capability. Our approach outperforms a baseline heuristic, generalizes to different target zones and maps, with limited generalization to unseen maps. We offer valuable insights into DRL algorithm design for long-horizon problems and provide a publicly available software framework for the CPP problem.
翻译:覆盖路径规划(CPP)是机器人领域中的关键问题,其目标是找到一条能够有效覆盖感兴趣区域内所有点的路径。本研究针对电池容量受限的无人机(UAV)在考虑约束条件下的覆盖路径规划问题,特别引入了充电环节。该问题的一个显著挑战在于如何将充电行程整合到整体覆盖策略中,凸显了进行战略性长期决策的复杂任务。我们提出了一种新颖的基于近端策略优化(PPO)的深度强化学习(DRL)方法,采用基于地图的观测,通过动作掩码和折扣因子调度来优化整个任务范围内的覆盖轨迹。进一步地,我们为智能体提供了位置历史信息,以处理因充电能力而出现的状态循环问题。我们的方法在性能上超过基线启发式算法,能够泛化至不同的目标区域和地图,但对未见地图的泛化能力有限。本研究为长时域问题的深度强化学习算法设计提供了宝贵见解,并针对覆盖路径规划问题公开提供了配套软件框架。