This paper addresses the critical challenge of coordinating mobile edge UAV networks to maintain robust service in highly dynamic spatiotemporal environments. Conventional Deep Reinforcement Learning (DRL) approaches often suffer from catastrophic forgetting when transitioning between distinct task scenarios, such as moving from dense urban clusters to sparse rural areas. These transitions typically necessitate computationally expensive retraining or model resets to adapt to new user distributions, leading to service interruptions. To overcome these limitations, we propose a computationally efficient Spatiotemporal Continual Learning (STCL) framework realized through a Group-Decoupled Multi-Agent Proximal Policy Optimization (G-MAPPO) algorithm. Our approach integrates a novel Group-Decoupled Policy Optimization (GDPO) mechanism that utilizes dynamic $z$-score normalization to autonomously balance heterogeneous objectives, including energy efficiency, user fairness, and coverage. This mechanism effectively mitigates gradient conflicts induced by concept drifts without requiring offline retraining. Furthermore, the framework leverages the 3D mobility of UAVs as a spatial compensation layer, enabling the swarm to autonomously adjust altitudes to accommodate extreme density fluctuations. Extensive simulations demonstrate that the proposed STCL framework achieves superior resilience, characterized by an elastic recovery of service reliability to approximately 0.95 during phase transitions. Compared to the MADDPG baseline, G-MAPPO not only prevents knowledge forgetting but also delivers an effective capacity gain of 20\% under extreme traffic loads, validating its potential as a scalable solution for edge-enabled aerial swarms.
翻译:本文针对在高度动态的时空环境中协调移动边缘无人机网络以维持鲁棒服务的关键挑战展开研究。传统的深度强化学习方法在切换不同任务场景(例如从密集城市集群转移到稀疏农村区域)时,常遭受灾难性遗忘问题。这些切换通常需要计算成本高昂的重新训练或模型重置以适应新的用户分布,从而导致服务中断。为克服这些限制,我们提出了一种通过组解耦多智能体近端策略优化算法实现的计算高效的时空持续学习框架。我们的方法集成了一种新颖的组解耦策略优化机制,该机制利用动态$z$-分数归一化来自主平衡异构目标,包括能效、用户公平性和覆盖范围。此机制有效缓解了由概念漂移引起的梯度冲突,且无需离线重新训练。此外,该框架利用无人机的三维移动性作为空间补偿层,使集群能够自主调整高度以适应极端的密度波动。大量仿真实验表明,所提出的时空持续学习框架实现了卓越的弹性,其特点是在阶段转换期间服务可靠性可弹性恢复至约0.95。与MADDPG基线相比,G-MAPPO不仅防止了知识遗忘,而且在极端流量负载下实现了20%的有效容量增益,验证了其作为支持边缘的空中集群可扩展解决方案的潜力。