This article presents a deep reinforcement learning-based approach to tackle a persistent surveillance mission requiring a single unmanned aerial vehicle initially stationed at a depot with fuel or time-of-flight constraints to repeatedly visit a set of targets with equal priority. Owing to the vehicle's fuel or time-of-flight constraints, the vehicle must be regularly refueled, or its battery must be recharged at the depot. The objective of the problem is to determine an optimal sequence of visits to the targets that minimizes the maximum time elapsed between successive visits to any target while ensuring that the vehicle never runs out of fuel or charge. We present a deep reinforcement learning algorithm to solve this problem and present the results of numerical experiments that corroborate the effectiveness of this approach in comparison with common-sense greedy heuristics.
翻译:本文提出了一种基于深度强化学习的方法,以解决具有燃料或飞行时间约束的单个无人机持续监视任务。该无人机初始驻扎在基地,需在燃料或飞行时间约束下,以相同优先级重复访问一组目标。由于燃料或飞行时间约束,无人机必须在基地定期加油或充电。问题目标是确定最优目标访问序列,以最小化任意目标两次连续访问之间的最大时间间隔,同时确保无人机始终不耗尽燃料或电量。我们提出了一种深度强化学习算法来求解该问题,并通过数值实验结果验证了该方法相较于常识性贪婪启发式算法的有效性。