This paper addresses the joint optimization of trajectories and bandwidth allocation for multiple Unmanned Aerial Vehicles (UAVs) to enhance energy efficiency in the cooperative data collection problem. We focus on an important yet underestimated aspect of the system, where action synchronization across all UAVs is impossible. Since most existing learning-based solutions are not designed to learn in this asynchronous environment, we formulate the trajectory planning problem as a Decentralized Partially Observable Semi-Markov Decision Process and introduce an asynchronous multi-agent learning algorithm to learn UAVs' cooperative policies. Once the UAVs' trajectory policies are learned, the bandwidth allocation can be optimally solved based on local observations at each collection point. Comprehensive empirical results demonstrate the superiority of the proposed method over other learning-based and heuristic baselines in terms of both energy efficiency and mission completion time. Additionally, the learned policies exhibit robustness under varying environmental conditions.
翻译:本文研究了多无人机协同数据采集中轨迹与带宽分配的联合优化问题,旨在提升系统能效。我们聚焦于该系统中一个重要但长期被低估的方面:所有无人机间的动作同步在实际中无法实现。由于现有基于学习的方法大多未针对此类异步环境进行设计,本文将轨迹规划问题建模为去中心化部分可观测半马尔可夫决策过程,并提出一种异步多智能体学习算法以训练无人机的协同策略。在习得无人机轨迹策略后,带宽分配问题可基于各采集点的局部观测进行最优求解。大量实验结果表明,所提方法在能效与任务完成时间方面均优于其他基于学习的方案及启发式基线方法。此外,学习得到的策略在不同环境条件下展现出良好的鲁棒性。