Unmanned aerial vehicles serving as aerial base stations can rapidly restore connectivity after disasters, yet abrupt changes in user mobility and traffic demands shift the quality of service trade-offs and induce strong non-stationarity. Deep reinforcement learning policies suffer from plasticity loss under such shifts, as representation collapse and neuron dormancy impair adaptation. We propose plasticity enhanced multi-agent mixture of experts (PE-MAMoE), a centralized training with decentralized execution framework built on multi-agent proximal policy optimization. PE-MAMoE equips each UAV with a sparsely gated mixture of experts actor whose router selects a single specialist per step. A non-parametric Phase Controller injects brief, expert-only stochastic perturbations after phase switches, resets the action log-standard-deviation, anneals entropy and learning rate, and schedules the router temperature, all to re-plasticize the policy without destabilizing safe behaviors. We derive a dynamic regret bound showing the tracking error scales with both environment variation and cumulative noise energy. In a phase-driven simulator with mobile users and 3GPP-style channels, PE-MAMoE improves normalized interquartile mean return by 26.3\% over the best baseline, increases served-user capacity by 12.8\%, and reduces collisions by approximately 75\%. Diagnostics confirm persistently higher expert feature rank and periodic dormant-neuron recovery at regime switches.
翻译:作为空中基站的无人机能够在灾难后快速恢复通信,但用户移动性和流量需求的突变会改变服务质量权衡,并引发强非平稳性。在此类变化下,深度强化学习策略因表征坍缩和神经元休眠导致塑性丧失,从而削弱其适应能力。我们提出塑性增强型多专家混合体(PE-MAMoE),这是一种基于多智能体近端策略优化的集中训练与分散执行框架。PE-MAMoE为每架无人机配备稀疏门控的混合专家执行器,其路由器每一步仅选择单一专家。一个参数化相位控制器在相位切换后注入短暂的、仅针对专家的随机扰动,重置动作对数标准差,退火熵与学习率,并调度路由器温度,以在不破坏安全行为的前提下重新塑性策略。我们推导了动态遗憾界,表明跟踪误差与环境变化和累积噪声能量均成比例。在包含移动用户和3GPP标准信道的相位驱动仿真器中,PE-MAMoE将标准化四分位均值回报相比最佳基线提升26.3%,增加服务用户容量12.8%,并减少约75%的碰撞。诊断结果证实,在状态转换时专家特征秩持续升高且休眠神经元周期性恢复。