Multi-Agent Deep Reinforcement Learning For Optimising Energy Efficiency of Fixed-Wing UAV Cellular Access Points

Unmanned Aerial Vehicles (UAVs) promise to become an intrinsic part of next generation communications, as they can be deployed to provide wireless connectivity to ground users to supplement existing terrestrial networks. The majority of the existing research into the use of UAV access points for cellular coverage considers rotary-wing UAV designs (i.e. quadcopters). However, we expect fixed-wing UAVs to be more appropriate for connectivity purposes in scenarios where long flight times are necessary (such as for rural coverage), as fixed-wing UAVs rely on a more energy-efficient form of flight when compared to the rotary-wing design. As fixed-wing UAVs are typically incapable of hovering in place, their deployment optimisation involves optimising their individual flight trajectories in a way that allows them to deliver high quality service to the ground users in an energy-efficient manner. In this paper, we propose a multi-agent deep reinforcement learning approach to optimise the energy efficiency of fixed-wing UAV cellular access points while still allowing them to deliver high-quality service to users on the ground. In our decentralized approach, each UAV is equipped with a Dueling Deep Q-Network (DDQN) agent which can adjust the 3D trajectory of the UAV over a series of timesteps. By coordinating with their neighbours, the UAVs adjust their individual flight trajectories in a manner that optimises the total system energy efficiency. We benchmark the performance of our approach against a series of heuristic trajectory planning strategies, and demonstrate that our method can improve the system energy efficiency by as much as 70%.

翻译：无人驾驶航空飞行器(UAVs)有望成为下一代通信的内在组成部分,因为这些飞行器可以被部署,为地面用户提供无线连接,以补充现有的地面网络;关于使用UAV接入点进行蜂窝覆盖的大多数现有研究都考虑到旋转翼无人驾驶飞行器的设计(即四重机);然而,我们期望固定翼无人驾驶飞行器在需要飞行时间较长(例如农村覆盖)的情况下更适合连接目的,因为固定翼无人驾驶飞行器与旋转翼设计相比,依赖一种更节能的飞行形式。由于固定翼无人驾驶飞行器通常无法在固定位置上徘徊,因此其部署优化涉及优化其单项飞行轨迹的设计,从而使其能够以节能的方式向地面用户提供高质量的服务。在本文中,我们建议采用多试深的强化学习方法,优化固定翼无人驾驶飞行器的蜂窝接入点的能源效率,同时允许它们向地面用户提供高质量的服务。在我们各自分散式的UAVAVS系统中,每个选择的飞行轨迹系统都能够以高速度调整我们的轨道。

相关内容

深度强化学习

关注 0

深度强化学习 (DRL) 是一种使用深度学习技术扩展传统强化学习方法的一种机器学习方法。传统强化学习方法的主要任务是使得主体根据从环境中获得的奖赏能够学习到最大化奖赏的行为。然而，传统无模型强化学习方法需要使用函数逼近技术使得主体能够学习出值函数或者策略。在这种情况下，深度学习强大的函数逼近能力自然成为了替代人工指定特征的最好手段并为性能更好的端到端学习的实现提供了可能。

机器学习组合优化

专知会员服务

111+阅读 · 2021年2月16日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

55+阅读 · 2020年9月7日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日