Q-learning Based System for Path Planning with UAV Swarms in Obstacle Environments

Path Planning methods for autonomous control of Unmanned Aerial Vehicle (UAV) swarms are on the rise because of all the advantages they bring. There are more and more scenarios where autonomous control of multiple UAVs is required. Most of these scenarios present a large number of obstacles, such as power lines or trees. If all UAVs can be operated autonomously, personnel expenses can be decreased. In addition, if their flight paths are optimal, energy consumption is reduced. This ensures that more battery time is left for other operations. In this paper, a Reinforcement Learning based system is proposed for solving this problem in environments with obstacles by making use of Q-Learning. This method allows a model, in this particular case an Artificial Neural Network, to self-adjust by learning from its mistakes and achievements. Regardless of the size of the map or the number of UAVs in the swarm, the goal of these paths is to ensure complete coverage of an area with fixed obstacles for tasks, like field prospecting. Setting goals or having any prior information aside from the provided map is not required. For experimentation, five maps of different sizes with different obstacles were used. The experiments were performed with different number of UAVs. For the calculation of the results, the number of actions taken by all UAVs to complete the task in each experiment is taken into account. The lower the number of actions, the shorter the path and the lower the energy consumption. The results are satisfactory, showing that the system obtains solutions in fewer movements the more UAVs there are. For a better presentation, these results have been compared to another state-of-the-art approach.

翻译：针对无人飞行器集群自主控制的路径规划方法因其诸多优势而日益受到关注。目前，多架无人机自主控制的应用场景越来越多。这些场景中大多存在大量障碍物，例如电力线或树木。若所有无人机能实现自主运行，则可降低人力成本。此外，若其飞行路径达到最优，能耗将减少，从而为其他任务留出更多电池续航时间。本文提出一种基于强化学习的系统，通过利用Q-学习来解决含障碍环境中的这一问题。该方法允许模型（本研究中特指人工神经网络）通过从自身失误和成功中学习实现自我调整。无论地图尺寸或集群中无人机数量如何，这些路径的目标是确保对存在固定障碍物的区域实现完全覆盖，以完成田野勘探等任务。除提供的初始地图外，无需设定任何目标或事先信息。实验采用五张不同尺寸且包含不同障碍物的地图，并使用不同数量的无人机进行测试。计算结果时，统计每次实验中所有无人机完成任务所需的总动作次数。动作次数越少，表明路径越短，能耗越低。结果表明，系统在无人机数量越多时能以更少移动步数获得解决方案，效果令人满意。为更好地展示，这些结果已与另一种前沿方法进行了对比。