无人机网络中基于生成式人工智能增强的多智能体强化学习的深度神经网络任务分配方法 (DNN Task Assignment in UAV Networks: A Generative AI Enhanced Multi-Agent Reinforcement Learning Approach)

Unmanned Aerial Vehicles (UAVs) possess high mobility and flexible deployment capabilities, prompting the development of UAVs for various application scenarios within the Internet of Things (IoT). The unique capabilities of UAVs give rise to increasingly critical and complex tasks in uncertain and potentially harsh environments. The substantial amount of data generated from these applications necessitates processing and analysis through deep neural networks (DNNs). However, UAVs encounter challenges due to their limited computing resources when managing DNN models. This paper presents a joint approach that combines multiple-agent reinforcement learning (MARL) and generative diffusion models (GDM) for assigning DNN tasks to a UAV swarm, aimed at reducing latency from task capture to result output. To address these challenges, we first consider the task size of the target area to be inspected and the shortest flying path as optimization constraints, employing a greedy algorithm to resolve the subproblem with a focus on minimizing the UAV's flying path and the overall system cost. In the second stage, we introduce a novel DNN task assignment algorithm, termed GDM-MADDPG, which utilizes the reverse denoising process of GDM to replace the actor network in multi-agent deep deterministic policy gradient (MADDPG). This approach generates specific DNN task assignment actions based on agents' observations in a dynamic environment. Simulation results indicate that our algorithm performs favorably compared to benchmarks in terms of path planning, Age of Information (AoI), energy consumption, and task load balancing.

翻译：无人机凭借其高机动性和灵活部署能力，在物联网的多种应用场景中得到快速发展。无人机在不确定且可能恶劣环境中的独特能力，催生了日益关键和复杂的任务。这些应用产生的大量数据需要通过深度神经网络进行处理和分析。然而，无人机在管理DNN模型时，因其有限的计算资源而面临挑战。本文提出了一种结合多智能体强化学习和生成式扩散模型的联合方法，用于将DNN任务分配给无人机集群，旨在降低从任务捕获到结果输出的延迟。为应对这些挑战，我们首先将待检测目标区域的任务规模和最短飞行路径作为优化约束，采用贪心算法解决子问题，重点在于最小化无人机的飞行路径和整体系统成本。在第二阶段，我们引入了一种新颖的DNN任务分配算法，称为GDM-MADDPG，该算法利用GDM的反向去噪过程替代多智能体深度确定性策略梯度中的行动者网络。这种方法基于智能体在动态环境中的观察，生成具体的DNN任务分配动作。仿真结果表明，在路径规划、信息年龄、能耗和任务负载均衡方面，我们的算法相较于基准方法表现更优。