Unmanned Aerial Vehicles (UAVs) are increasingly used as aerial base stations to provide ad hoc communications infrastructure. Building upon prior research efforts which consider either static nodes, 2D trajectories or single UAV systems, this paper focuses on the use of multiple UAVs for providing wireless communication to mobile users in the absence of terrestrial communications infrastructure. In particular, we jointly optimize UAV 3D trajectory and NOMA power allocation to maximize system throughput. Firstly, a weighted K-means-based clustering algorithm establishes UAV-user associations at regular intervals. The efficacy of training a novel Shared Deep Q-Network (SDQN) with action masking is then explored. Unlike training each UAV separately using DQN, the SDQN reduces training time by using the experiences of multiple UAVs instead of a single agent. We also show that SDQN can be used to train a multi-agent system with differing action spaces. Simulation results confirm that: 1) training a shared DQN outperforms a conventional DQN in terms of maximum system throughput (+20%) and training time (-10%); 2) it can converge for agents with different action spaces, yielding a 9% increase in throughput compared to mutual learning algorithms; and 3) combining NOMA with an SDQN architecture enables the network to achieve a better sum rate compared with existing baseline schemes.
翻译:无人机(UAV)正日益被用作空中基站,以提供临时通信基础设施。基于先前仅考虑静态节点、二维轨迹或单无人机系统的研究工作,本文聚焦于在没有地面通信基础设施的情况下,利用多无人机为移动用户提供无线通信。具体而言,我们联合优化无人机三维轨迹与非正交多址(NOMA)功率分配,以最大化系统吞吐量。首先,采用加权K-means聚类算法定期建立无人机-用户关联。随后,探索了一种结合动作掩码的新型共享深度Q网络(SDQN)的训练效果。与使用DQN分别训练每架无人机不同,SDQN通过利用多架无人机的经验(而非单一智能体)减少了训练时间。我们还证明,SDQN可用于训练具有不同动作空间的多智能体系统。仿真结果证实:1)训练共享DQN在最大系统吞吐量(+20%)和训练时间(-10%)方面优于传统DQN;2)该网络能收敛于不同动作空间的智能体,与互学习算法相比,吞吐量提升9%;3)将NOMA与SDQN架构相结合,相较于现有基线方案,网络可实现更优的总和速率。