This paper addresses catastrophic forgetting in mobile edge UAV networks within dynamic spatiotemporal environments. Conventional deep reinforcement learning often fails during task transitions, necessitating costly retraining to adapt to new user distributions. We propose the spatiotemporal continual learning (STCL) framework, realized through the group-decoupled multi-agent proximal policy optimization (G-MAPPO) algorithm. The core innovation lies in the integration of a group-decoupled policy optimization (GDPO) mechanism with a gradient orthogonalization layer to balance heterogeneous objectives including energy efficiency, user fairness, and coverage. This combination employs dynamic z-score normalization and gradient projection to mitigate conflicts without offline resets. Furthermore, 3D UAV mobility serves as a spatial compensation layer to manage extreme density shifts. Simulations demonstrate that the STCL framework ensures resilience, with service reliability recovering to over 0.9 for moderate loads of up to 100 users. Even under extreme saturation with 140 users, G-MAPPO maintains a significant performance lead over the multi-agent deep deterministic policy gradient (MADDPG) baseline by preventing policy stagnation. The algorithm delivers an effective capacity gain of 20 percent under high traffic loads, validating its potential for scalable aerial edge swarms.
翻译:本文针对动态时空环境下移动边缘无人机网络中灾难性遗忘问题展开研究。传统深度强化学习在任务转换时经常失效,需要昂贵的重新训练来适应新的用户分布。我们提出时空连续学习(STCL)框架,通过群解耦多智能体近端策略优化(G-MAPPO)算法实现。核心创新在于将群解耦策略优化(GDPO)机制与梯度正交化层相结合,以平衡包括能量效率、用户公平性和覆盖率在内的异构目标。该组合采用动态z分数归一化和梯度投影来缓解冲突,无需离线重置。此外,3D无人机移动性作为空间补偿层,用于管理极端密度变化。仿真表明,STCL框架确保弹性恢复能力,在负载适中(最多100个用户)时服务可靠性恢复至0.9以上。即使在140个用户的极端饱和状态下,G-MAPPO通过防止策略停滞,相比多智能体深度确定性策略梯度(MADDPG)基线保持显著性能优势。该算法在高流量负载下实现有效容量增益20%,验证了其在大规模空中边缘机群中的可扩展潜力。