Due to their adaptability and mobility, Unmanned Aerial Vehicles (UAVs) are becoming increasingly essential for wireless network services, particularly for data harvesting tasks. In this context, Artificial Intelligence (AI)-based approaches have gained significant attention for addressing UAV path planning tasks in large and complex environments, bridging the gap with real-world deployments. However, many existing algorithms suffer from limited training data, which hampers their performance in highly dynamic environments. Moreover, they often overlook the inherently multi-objective nature of the task, treating it in an overly simplistic manner. To address these limitations, we propose an attention-based Multi-Objective Reinforcement Learning (MORL) architecture that explicitly handles the trade-off between data collection and energy consumption in urban environments, even without prior knowledge of wireless channel conditions. Our method develops a single model capable of adapting to varying trade-off preferences and dynamic scenario parameters without the need for fine-tuning or retraining. Extensive simulations show that our approach achieves substantial improvements in performance, model compactness, sample efficiency, and most importantly, generalization to previously unseen scenarios, outperforming existing RL solutions.
翻译:凭借其适应性与机动性,无人机正日益成为无线网络服务(特别是数据收集任务)的关键要素。在此背景下,基于人工智能的方法因其能在大规模复杂环境中处理无人机路径规划任务,并弥合与实际部署之间的差距而受到广泛关注。然而,现有算法大多受限于训练数据不足,导致其在高度动态环境中的性能受限。此外,这些方法常忽视任务本身固有的多目标特性,以过于简化的方式进行处理。为应对这些局限,本文提出一种基于注意力的多目标强化学习架构,该架构能在无需先验无线信道状态知识的条件下,显式处理城市环境中数据收集与能量消耗之间的权衡关系。我们的方法构建了单一模型,能够适应不同的权衡偏好与动态场景参数,且无需进行微调或重新训练。大量仿真实验表明,该方法在性能、模型紧凑性、样本效率以及(最为关键的)对未见过场景的泛化能力方面均取得显著提升,其表现优于现有强化学习解决方案。