低空卫星-自主飞行器协同移动边缘计算与数据收集的扩散深度强化学习方法 (Low-Altitude Satellite-AAV Collaborative Joint Mobile Edge Computing and Data Collection via Diffusion-based Deep Reinforcement Learning)

The integration of satellite and autonomous aerial vehicle (AAV) communications has become essential for the scenarios requiring both wide coverage and rapid deployment, particularly in remote or disaster-stricken areas where the terrestrial infrastructure is unavailable. Furthermore, emerging applications increasingly demand simultaneous mobile edge computing (MEC) and data collection (DC) capabilities within the same aerial network. However, jointly optimizing these operations in heterogeneous satellite-AAV systems presents significant challenges due to limited on-board resources and competing demands under dynamic channel conditions. In this work, we investigate a satellite-AAV-enabled joint MEC-DC system where these platforms collaborate to serve ground devices (GDs). Specifically, we formulate a joint optimization problem to minimize the average MEC end-to-end delay and AAV energy consumption while maximizing the collected data. Since the formulated optimization problem is a non-convex mixed-integer nonlinear programming (MINLP) problem, we propose a Q-weighted variational policy optimization-based joint AAV movement control, GD association, offloading decision, and bandwidth allocation (QAGOB) approach. Specifically, we reformulate the optimization problem as an action space-transformed Markov decision process to adapt the variable action dimensions and hybrid action space. Subsequently, QAGOB leverages the multi-modal generation capacities of diffusion models to optimize policies and can achieve better sample efficiency while controlling the diffusion costs during training. Simulation results show that QAGOB outperforms five other benchmarks, including traditional DRL and diffusion-based DRL algorithms. Furthermore, the MEC-DC joint optimization achieves significant advantages when compared to the separate optimization of MEC and DC.

翻译：卫星与自主飞行器（AAV）通信的集成已成为需要广域覆盖与快速部署场景的关键技术，尤其在缺乏地面基础设施的偏远或灾区。此外，新兴应用日益要求在同一空基网络中同时具备移动边缘计算（MEC）与数据收集（DC）能力。然而，在异构卫星-AAV系统中联合优化这些操作面临显著挑战，原因在于机载资源有限且在动态信道条件下存在竞争性需求。本研究提出一种卫星-AAV协同的联合MEC-DC系统，这些平台协作服务地面设备（GDs）。具体而言，我们构建了一个联合优化问题，旨在最小化平均MEC端到端延迟与AAV能耗，同时最大化收集数据量。由于该优化问题属于非凸混合整数非线性规划（MINLP）问题，我们提出一种基于Q加权变分策略优化的联合AAV移动控制、GD关联、卸载决策与带宽分配（QAGOB）方法。具体地，我们将优化问题重构为动作空间转换的马尔可夫决策过程，以适应可变动作维度与混合动作空间。随后，QAGOB利用扩散模型的多模态生成能力优化策略，在控制训练过程中扩散成本的同时实现更优的样本效率。仿真结果表明，QAGOB在性能上优于其他五种基准方法，包括传统深度强化学习算法与基于扩散的深度强化学习算法。此外，与MEC和DC分别优化的方案相比，MEC-DC联合优化展现出显著优势。