Under the 6G wireless network evolution, the low-altitude Internet of Things (IoT), supported by unmanned aerial vehicles (UAVs) with Integrated Sensing and Communication (ISAC) capabilities, provides ground sensing networks with advanced real-time monitoring and data collection. To maximize data collection volume from distributed IoT nodes, AI-powered data collection technology plays a critical role in enabling intelligent decision-making. Among them, deep reinforcement learning (DRL) has gained particular attention. However, the existing DRL-based work on UAV-assisted IoT nodes data collection rarely address problems such as unknown interference and dynamic data volume. Moreover, these DRL models have high arithmetic requirements and slow convergence speed, making it difficult to carry on UAVs with limited load and arithmetic power. To address these challenges, a hierarchical deep reinforcement learning (HDRL), which can converge quickly and with smaller models, is designed to optimize UAV trajectories and bandwidth allocation to maximize data collection volume. Firstly, the proposed scenario incorporates interference from jammers, dynamic data volume of IoT nodes, and multiple types of obstacles. The entire task is hierarchically structured: the upper-level makes flight trajectory decisions at a coarse temporal granularity, while the lower-level makes bandwidth allocation decisions at a finer temporal granularity. Secondly, a trajectory and bandwidth allocation optimization algorithm based on hierarchical deep deterministic policy gradients (TBH-DDPG) is proposed to solve the problem. Finally, simulation results demonstrate that the proposed algorithm improves convergence speed by 44.44%, and reduces computational cost by 58.05%, compared to non-hierarchical algorithm.
翻译:在6G无线网络演进背景下,由具备通感一体化能力的无人机支撑的低空物联网,为地面传感网络提供了先进的实时监测与数据收集能力。为最大化从分布式物联网节点收集的数据量,基于人工智能的数据收集技术成为实现智能决策的关键。其中,深度强化学习受到特别关注。然而,现有基于深度强化学习的无人机辅助物联网节点数据收集研究鲜有涉及未知干扰与动态数据量等问题。此外,这些深度强化学习模型计算需求高、收敛速度慢,难以部署在负载与算力受限的无人机上。为此,本文设计了一种可快速收敛且模型更小的分层深度强化学习,通过优化无人机轨迹与带宽分配以最大化数据收集量。首先,所提场景考虑了干扰源的干扰、物联网节点动态数据量以及多种障碍物。整个任务采用分层架构:上层以粗时间粒度进行飞行轨迹决策,下层以细时间粒度进行带宽分配决策。其次,提出基于分层深度确定性策略梯度的轨迹与带宽分配优化算法以解决该问题。仿真结果表明,与非分层算法相比,该算法收敛速度提升44.44%,计算成本降低58.05%。