Unmanned Aerial Vehicles (UAVs) possess high mobility and flexible deployment capabilities, prompting the development of UAVs for various application scenarios within the Internet of Things (IoT). The unique capabilities of UAVs give rise to increasingly critical and complex tasks in uncertain and potentially harsh environments. The substantial amount of data generated from these applications necessitates processing and analysis through deep neural networks (DNNs). However, UAVs encounter challenges due to their limited computing resources when managing DNN models. This paper presents a joint approach that combines multiple-agent reinforcement learning (MARL) and generative diffusion models (GDM) for assigning DNN tasks to a UAV swarm, aimed at reducing latency from task capture to result output. To address these challenges, we first consider the task size of the target area to be inspected and the shortest flying path as optimization constraints, employing a greedy algorithm to resolve the subproblem with a focus on minimizing the UAV's flying path and the overall system cost. In the second stage, we introduce a novel DNN task assignment algorithm, termed GDM-MADDPG, which utilizes the reverse denoising process of GDM to replace the actor network in multi-agent deep deterministic policy gradient (MADDPG). This approach generates specific DNN task assignment actions based on agents' observations in a dynamic environment. Simulation results indicate that our algorithm performs favorably compared to benchmarks in terms of path planning, Age of Information (AoI), energy consumption, and task load balancing.
翻译:无人机凭借其高机动性和灵活部署能力,促使物联网中面向多种应用场景的无人机技术得以发展。无人机的独特能力使其在不确定且可能恶劣的环境中承担日益关键且复杂的任务。这些应用产生的大量数据需要通过深度神经网络进行处理和分析。然而,无人机在管理DNN模型时面临计算资源有限的挑战。本文提出一种结合多智能体强化学习与生成式扩散模型的联合方法,用于向无人机集群分配DNN任务,旨在降低从任务捕获到结果输出的延迟。为应对这些挑战,我们首先将待检测目标区域的任务规模与最短飞行路径作为优化约束,采用贪心算法求解该子问题,重点在于最小化无人机飞行路径与系统总成本。在第二阶段,我们提出一种新颖的DNN任务分配算法GDM-MADDPG,该算法利用GDM的反向去噪过程替代多智能体深度确定性策略梯度中的行动者网络,从而基于智能体在动态环境中的观测生成具体的DNN任务分配动作。仿真结果表明,在路径规划、信息年龄、能耗及任务负载均衡方面,我们的算法相较于基准方法表现出优越性能。