Deep reinforcement learning (DRL) has been extensively applied to Multi-Unmanned Aerial Vehicle (UAV) network (MUN) to effectively enable real-time adaptation to complex, time-varying environments. Nevertheless, most of the existing works assume a stationary user distribution (UD) or a dynamic one with predicted patterns. Such considerations may make the UD-specific strategies insufficient when a MUN is deployed in unknown environments. To this end, this paper investigates distributed user connectivity maximization problem in a MUN with generalization to arbitrary UDs. Specifically, the problem is first formulated into a time-coupled combinatorial nonlinear non-convex optimization with arbitrary underlying UDs. To make the optimization tractable, a multi-agent CNN-enhanced deep Q learning (MA-CDQL) algorithm is proposed. The algorithm integrates a ResNet-based CNN to the policy network to analyze the input UD in real time and obtain optimal decisions based on the extracted high-level UD features. To improve the learning efficiency and avoid local optimums, a heatmap algorithm is developed to transform the raw UD to a continuous density map. The map will be part of the true input to the policy network. Simulations are conducted to demonstrate the efficacy of UD heatmaps and the proposed algorithm in maximizing user connectivity as compared to K-means methods.
翻译:深度强化学习(DRL)已被广泛应用于多无人机网络(MUN),以有效实现对复杂时变环境的实时适应。然而,现有研究大多假设用户分布(UD)是静态的,或具有可预测模式的动态分布。当MUN部署于未知环境时,此类考虑可能导致针对特定UD的策略效能不足。为此,本文研究了一种可推广至任意UD的MUN分布式用户连接性最大化问题。具体而言,该问题首先被表述为具有任意底层UD的时域耦合组合非线性非凸优化问题。为使优化易于处理,本文提出了一种多智能体CNN增强深度Q学习(MA-CDQL)算法。该算法将基于ResNet的CNN集成至策略网络,以实时分析输入的UD,并基于提取的高层UD特征获得最优决策。为提高学习效率并避免局部最优,开发了一种热力图算法,将原始UD转换为连续密度图。该密度图将作为策略网络真实输入的一部分。仿真实验表明,与K-means方法相比,UD热力图及所提算法在最大化用户连接性方面具有显著优势。