This article presents a deep reinforcement learning-based approach to tackle a persistent surveillance mission requiring a single unmanned aerial vehicle initially stationed at a depot with fuel or time-of-flight constraints to repeatedly visit a set of targets with equal priority. Owing to the vehicle's fuel or time-of-flight constraints, the vehicle must be regularly refueled, or its battery must be recharged at the depot. The objective of the problem is to determine an optimal sequence of visits to the targets that minimizes the maximum time elapsed between successive visits to any target while ensuring that the vehicle never runs out of fuel or charge. We present a deep reinforcement learning algorithm to solve this problem and present the results of numerical experiments that corroborate the effectiveness of this approach in comparison with common-sense greedy heuristics.
翻译:本文提出一种基于深度强化学习的方法,用于解决单架无人飞行器(初始驻留在基地)在燃油或飞行时间约束下,需重复访问一组等优先级目标的持续监视任务。由于飞行器的燃油或飞行时间限制,其必须在基地定期加油或更换电池。该问题的目标是在确保飞行器永不耗尽燃油或电量的前提下,确定最优的目标访问序列,以最小化任意目标两次连续访问之间的最大时间间隔。我们提出一种深度强化学习算法来解决该问题,并通过数值实验验证了该方法相比常识性贪婪启发式算法的有效性。