Q-Learning based system for path planning with unmanned aerial vehicles swarms in obstacle environments

Path Planning methods for autonomous control of Unmanned Aerial Vehicle (UAV) swarms are on the rise because of all the advantages they bring. There are more and more scenarios where autonomous control of multiple UAVs is required. Most of these scenarios present a large number of obstacles, such as power lines or trees. If all UAVs can be operated autonomously, personnel expenses can be decreased. In addition, if their flight paths are optimal, energy consumption is reduced. This ensures that more battery time is left for other operations. In this paper, a Reinforcement Learning based system is proposed for solving this problem in environments with obstacles by making use of Q-Learning. This method allows a model, in this particular case an Artificial Neural Network, to self-adjust by learning from its mistakes and achievements. Regardless of the size of the map or the number of UAVs in the swarm, the goal of these paths is to ensure complete coverage of an area with fixed obstacles for tasks, like field prospecting. Setting goals or having any prior information aside from the provided map is not required. For experimentation, five maps of different sizes with different obstacles were used. The experiments were performed with different number of UAVs. For the calculation of the results, the number of actions taken by all UAVs to complete the task in each experiment is taken into account. The lower the number of actions, the shorter the path and the lower the energy consumption. The results are satisfactory, showing that the system obtains solutions in fewer movements the more UAVs there are. For a better presentation, these results have been compared to another state-of-the-art approach.

翻译：针对无人机集群自主控制的路径规划方法因其带来的诸多优势而日益受到关注。需要多架无人机自主控制的场景越来越多，这些场景中通常存在大量障碍物，例如电力线或树木。若所有无人机都能实现自主运行，则可降低人员成本。此外，若其飞行路径达到最优，则可减少能耗，从而为其他任务预留更多电池续航时间。本文提出一种基于强化学习的系统，通过利用Q-学习来解决存在障碍环境下的这一问题。该方法允许模型（在此具体案例中为人工神经网络）通过从自身错误与成就中学习进行自我调整。无论地图尺寸或集群中无人机数量如何，这些路径的目标是确保对有固定障碍物的区域实现完全覆盖（例如用于田野勘探等任务）。除提供的地图外，无需设定目标或任何先验信息。实验采用了五张不同尺寸、包含不同障碍物的地图，并针对不同数量的无人机进行实验。结果计算中考虑了每次实验中所有无人机完成任务所采取的动作次数。动作次数越少，路径越短，能耗越低。结果令人满意，表明系统中无人机数量越多，其获取解决方案所需的移动步数越少。为更清晰地呈现，这些结果已与另一项前沿方法进行了对比。